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(57) Abstract 

The invention relates to the transcriptional regulatory sequence (TRS) of carcinoembryonic antigen (CEA) and molecular chimaera 
comprising the CEA TRS and DNA encoding a heterologous enzyme. CEA TRS is capable of targeting expression of the heterologous 
enzyme to CEA* cells and the heterologous enzyme is preferably an enzyme capable of catalysing the production of an agent cytotoxic or 
cytostatic to CEA+ cells. For example the enzyme may be cytosine deaminase which is capable of catalysing formation of the cytotoxic 
compound 5-fluorouracil from the non toxic compound 5-fluorocytosine. 
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TRANSCRIPTIONAL REGUL ATORY SEQ UENCE OF CARCINOEMBRYONIC ANTIGEN FOR 

EXPRESSION TARGETING 

The present invention relates to a transcriptional regulatory sequence useful in 
gene therapy. 

Colorectal carcinoma (CRC) is the second most frequent cancer and the second 
leading cause of cancer-associated deaths in the United States and Western Europe, The 
overall five-year survival rate for patients has not meaningfully improved in the last three 
decades. Prognosis for the CRC cancer patient is associated with the depth of tumor 
penetration into the bowel wall, the presence of regional lymph node involvement and, 
most importantly, the presence of distant metastases. The liver is the most common site 
for distant metastasis and, in approximately 30% of patients, the sole initial site of tumor 
recurrence after successful resection of the primary colon cancer. Hepatic metastases are 
the most common cause of death in the CRC cancer patient. 

The treatment of choice for the majority of patients with hepatic CRC metastasis 
is systemic or regional chemotherapy using 5-fluorouracil (5-FU) alone or in combination 
with other agents such as leviamasole. However, despite extensive effort, there is still no 
satisfactoiy treatment for hepatic CRC metastasis. Systemic single- and combination- 
agent chemotherapy and radiation are relatively ineffective emphasizing the need for new 
approaches and therapies for the treatment of the diseases. 

A gene therapy approach is being developed for primary and metastatic liver 
tumors that exploits the transcriptional differences between normal and metastatic cells. 
This approach involves linking the transcriptional regulatory sequences (TRS) of a tumor 
associated marker gene to the coding sequence of an enzyme, typically a non- 
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mammalian enzyme, to create an artificial chimaeric gene 
that is selectively expressed in cancer cells. The enzyme 
should be capable of converting a non-toxic prodrug into a 
cytotoxic or cytostatic drug thereby allowing f cr selective 
5 elimination of metastatic cells. 

The principle of this approach has been demonstrated 
using an al?ha-f etoprotein/ Varicella Zoster virus thymidine 
kinase chimaera to target hepatocellular carcinoma with the 

10 enzyme metaiolically activating the non- toxic prodrug 6- 
methoxypurir.e arabinonucleoside ultimately leading to 
formation of the cytoxic anabolita adenine 
arabinonucleoside triphosphate (see Huber et al , Proc. 
Natl. Acad. Sci U.S.A. , 33/ 3039-8043 (1991) and EP-A-0 415 

15 731). 

For the treatment of hepatic metastases of CRC, it is 
desirable to control the expression of an enzyne with the 
transcriptional regulatory sequences of a tumor marker 
20 associated with such metastases. 

CEA is a tumor associated marker that is regulated at 
the transcriptional level and is expressed by most CRC 
tumors but is not expressed in normal liver. CEA is widely 

25 used as an important diagnostic tool for postoperative 
surveillance, chemotherapy efficacy determinations, 
immuno localisation and immunotherapy. The TRS of CEA are 
potentially of value in the selective expression of an 
enzyme in CEA* tumor cells since there appears to be a very 

30 low heterogeneity of CEA within metastatic tumors, perhaps 
because CEA may have an important functional role in 
metastasis. 

The cloning of the CEA gene has been reported and the 
35 promoter localised to a region of 424 nucleotides upstream 
from the translational start (Schrewe et al, Mol. Cell. 
Biol., 10, 2738 - 2748 (1990) but the full TRS was not 
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identified. 

In the work on which the present invention is based, CEA genomic clones have 
been identified and isolated from the human chromosome 19 genomic library LL19NL01, 
ATCC number 57766, by standard techniques described hereinafter. The cloned CEA 
sequences comprise CEA enhancers in addition to the CEA promoter. The CEA 
enhancers are especially advantageous for high level expression in CEA-positive cells and 
no expression in CEA-negative cells. 

According to one aspect, the present invention provides a DNA molecule 
comprising the CEA TRS but without associated CEA coding sequence. 

According to another aspect, the present invention provides use of a CEA TRS 
for and targeting expression of a heterologous enryme to CEA* cells. Preferably the 
enzyme is capable of catalysing the production of an agent cytotoxic or cytostatic to the 
CEA + target cells. 

As described in more detail hereinafter, the present inventors have sequenced a 
large part of the CEA gene upstream of the coding sequence. As used herein, the term 
"CEA TRS H means any part of the CEA gene upstream of the coding sequence which 
has a transcriptional regulatory effect on a heterologous coding sequence operably linked 
thereto. 

Certain parts of the sequence of the CEA gene upstream of the coding sequence 
have been identified as making significant contributions to the transcriptional regulatory 
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effect, more particularly increasing the level and/or selectivity of transcription. 
Preferably the CEA TRS includes all or part of the region from about -299b to about 
+69b, more preferably about -90b to about +69b. Increases in the level of transcription 
and/or selectivity can also be obtained by including one or more of the following regions: 
-14.5kb to -10.6kb, preferably -13.6kb to -10.6kb, and/or -6. lkb to -3.8kb. All of the 
regions referred to above can be included in either orientation and in different 
combinations. In addition, repeats of these regions may be included, particularly repeats 
of the -90b to +69b region, containing for example 2,3, 4 or more copies of the region. 
The base numbering refers to the sequence of Figure 6. The regions referred to are 
included in the plasmids described in figure 5B. 

Gene therapy involves the stable integration of new genes into target cells and the 
expression of those genes, once they are in place, to alter the phenotype of that particular 
target cell (for review see Anderson, W.F. Science 226, 401-409 (1984) and 
McCormick, D. Biotechnology 3, 689-693, (1985)). Gene therapy may be beneficial for 
the treatment of genetic diseases that involve the replacement of one defective or missing 
enzyme, such as; hypoxanthine-guanine phosphoribosyl transferase in Lesch-Nyhan 
disease, purine nucleoside phosphorylase in severe immunodeficiency disease, and 
adenosine deaminase in severed combined immunodeficiency disease. 

It has now been found that it is possible to selectively arrest the growth o£ or 
kill, mammalian carcinoma cells with prodrugs, i.e. chemical agents capable 
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of selective conversion to cytotoxic (causing cell death) 
or cytostatic (suppressing cell multiplication and growth) 
metabolites. This is achieved by the construction of a 
molecular chimaera comprising a "target tissue-specific" 
5 TRS that is selectively activated in target calls, such as 
cancerous cells, and that controls the expression of a 
heterologous enzyme. This molecular chimaera may be 
manipulated via suitable vectors and incorporated into an 
infective virion- Upcn administration of an infective 
virion containing the molecular chimaera to a host (e.g., 
mammal or human), the enzyme is selectively expressed in 
the target cells. Administration of prodrugs (compounds 
that are selectively metabolised by the enzyme into 
metabolites that are either further metabolised to or are, 
15 in fact, cytotoxic or cytostatic agents) can then result in 
the production of the cytotoxic or cytostatic agent in situ 
in the cancer cell. According to the present invention CEA 
TRS provides the target tissue specificity. 

20 ' Molecular chimaeras (recombinant molecules comprised 

of unnatural combinations of genes or sections of genes) , 
and infective virions (complete viral particles capable of 
infecting appropriate host cells) are well known in the art 
of molecular biology. 

25 

A number of enzyme prodrug combinations may be used 
for the above purpose, providing the enzyme is capable of 
selectively activating the administered compound either 
directly or through an intermediate to a cytostatic or 

30 cytotoxic metabolite. The choice of compound will also 
depend on the enzyme system used, but must be selectively 
metabolised by the enzyme either directly or indirectly to 
a cytotoxic or cytostatic metabolite. The term 
heterologous enzyme, as used herein, refers to an enzyme 

35 that is derived from or associated with a species which, is 
different from the host to be treated and which will 
display the appropriate characteristics of the above 
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mentioned selectivity. In addition, it vill also be 
appreciated that a heterologous enzyme may also refer to a- 
enzyme that is derived from the host to be treated that has 
been modified to have unique characteristics unnatural tc 
5 the host. 

The enzyme cytcsine deaminase (CD) catalyses the 
deamination of cytosine to uracil. Cytosine deaminase is 
present in microbes and fungi but absen- in higher 

10 eukaryotes. This enzyme catalyses the hydrolytic 
deamination of cytosine and 5-f lucrocytosine (5-FC) tc 
uracil and 5-f luorouracii (5-FU) , respectively. Since 
mammalian cells do not express significant amounts of 
cytosine deaminase, they are incapable of converting 5-FC 

15 to the toxic metabolite 5-FU and therefore 5-f luorocyzosine 
is nontoxic to mammalian cells at concentrations which are 
effective for antimicrobial activity. 5-Fluorouracil is 
highly toxic to mammalian cells and is widely used as an 
anticancer agent, 

20 

In mammalian cells, some genes are ubiquitously 
expressed. Most genes, however, are expressed in a 
temporal and/ or tissue-specific manner, or are activated in 
response to extracellular inducers. For example, certain 

25 genes are actively transcribed only at very precise times 
in ontogeny in specific cell types, or in response to some 
inducing stimulus. This regulation is mediated in part by 
the interaction between transcriptional regulatory 
sequences (for example, promoter and enhancer regulatory 

30 DNA sequences) , and sequence-specific, DNA-binding 
transcriptional protein factors. 

It has now been found that it is possible to alter 

certain mammalian cells, e.g. colorectal carcinoma cells, 

* 

35 metastatic colorectal carcinoma cells and hepatic 
colorectal carcinoma cells to selectively express a 
het rologous enzyme as hereinbefore defined, e.g. CD. This 
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is achieved by the construction of molecular chimaeras in 
an expression cassette. 

Expression cassettes themselves are well known in the 
art of molecular biology. Such an expression cassette 
contains all essential DNA sequences required for 
expression of the heterologous enzyme in a mammalian cell. 
For example, a preferred expression cassette will contain 
a molecular chimaera containing the coding sequence for CD, 
an appropriate polyadenyiation signal for a mammalian gene 
(i.e., a pciyadenylaticn signal that will function in a 
mammalian cell) , and CIA enhancers and promoter sequences 
in the correct orientation. 

15 Normally, two DNA sequences are required for the 

complete and efficient transcriptional regulation of genes 
that encode messenger RNAs in mammalian cells: promoters 
and enhancers. Promoters are located immediatsly upstream 
(5') from the start site of transcription. Promoter 
20 sequences are required for accurate and efficient 

initiation of transcription. Different gene-specific 
promoters reveal a common pattern of organisation. A 
typical promoter includes an AT-rich region called a TATA 
box (which is located approximately 3 0 base pairs 5' to the 
25 transcription initiation start site) and one or more 
upstream promoter elements (UPEs) . The UPEs are a 
principle target for the interaction with sequence-specific 
nuclear transcriptional factors. The activity of promoter 
sequences is modulated by other sequences called enhancers. 
30 The enhancer sequence may be a great distance from the 
promoter in either an upstream (5') or downstream (3') 
position. Hence, enhancers operate in an orientation- and 
position-independent manner. However, based on similar 
structural organisation and function that may be 
35 interchanged, the absolute distinction between promoters 
and enhancers is somewhat arbitrary. Enhancers increase 
the rate of transcription from the promoter sequence. It 
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is predominantly the interaction between sequence-specific 
transcriptional factors with the UPI and enhancer sequences 
that enable mammalian cells to achieve tissue-specific gene 
expression. The presence of these transcriptional protein 
5 factors (tissue-specific, trans -activating factors) bound 
to the UPE and enhancers (cis-acting, regulatory sequences) 
enables other components of the -transcriptional machinery, 
including RNA polymerase, to initiate transcription with 
tissue-specific selectivity and accuracy. 

10 

The transcriptional regulatory sequence for CEA is 
suitable for targeting expression in colorectal carcinoma, 
metastatic colorectal carcinoma, and hepatic colorectal 
metastases, transformed cells of the gastrointestinal 

15 tract, lung, breast and other tissues. By placing the 
expression of the gene encoding CD under the 
transcriptional control of the CRC-associated marker gene, 
CEA, the nontoxic compound, 5-FC, can be metabolically 
activated to 5-FU selectively in CRC cells (for example, 

20 hepatic CRC cells) . An advantage of this system is that 
the generated toxic compound, 5-f luorouracil, can diffuse 
out of the cell in which it was generated and kill adjacent 
tumor cells which did not incorporate the artificial gene 
for cytosine deaminase. 

25 

In the work on which the present invention is based, 
CEA genomic clones were identified and isolated from the 
human chromosome 19 genomic library LL19NL01, ATCC number 
57766, by standard techniques described hereinafter. The 
30 cloned CEA sequences comprise CEA enhancers in 

addition to the CEA promoter. The CEA enhancers are 
especially advantageous for high level expression in CEA- 
positive cells and no expression in CEA-negative cells. 

3 5 The present invention further provides a molecular 

chimaera comprising a CEA TRS and a DNA sequence 
operatively linked thereto encoding a heterologous enzyme, 
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preferably an enzyme ' capable of catalysing the production 
cf an agent cytotoxic cr cytostatic to the CIA* cells. 

The present invention further provides a molecular 
5 chimaera comprising a DiTA sequence containing the coding 

sequence of the gene that codes for a heterologous enzyme 
under the control of a CZA TRS in an expression cassette. 

The present invenzicn further provides in a preferred 
embodiment a molecular chimaera comprising a CZA TRS which 
is operativaly linked to the coding sequence for the gene 
encoding a non-mammalian cytosine deaminase (CD) . The 
molecular chimaera comprises a promoter and additionally 
comprises an enhancer* 

In particular, the present invention provides a 
molecular chimaera comprising a DNA sequence of the coding 
sequence of the gene coding for the heterologous enzyme, 
which is preferably CD, additionally including an 
appropriate pclyadenylation sequence, which is linked 
downstream in n 3' position and in the proper orientation 
to a CEA TRS. Most preferably the expression cassette also 
contains an enhancer sequence. 

25 Preferably non-mammalian CD is selected from the group 

consisting of bacterial, fungal, and yeast cytosine 
deaminase. 

The molecular chimaera of the present invention may be 
30 made utilizing standard recombinant DNA techniques. 

Another aspect of the invention is the genomic CEA 
sequence as described by Seq ID1. 



15 



35 



The coding sequence of CD and a polyadenylation signal 
(for example see SeQ IDs 1 and 2) are placed in the 
proper 3 ' orientation to the essential CEA transcriptional 
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regulatory elements.' This molecular chimaera enables the 
selective expression of CD in cells or tissue -hat normally 
express CIA. Expression of the CD gene in zanmalian CRC 
and metastatic CRC (hepatic colorectal carcinoma 
5 metastases) will enable nontoxic 5-FC to be selectively 
metabolised to cytotoxic 5-FU. 

Accordingly, in a another aspect of the presen- 
invention, there is provided a method of constructing a 
10 molecular chimaera comprising linking a DNA sequence 
encoding a heterologous enzyme gene, e.g. CD, a CIA TRS . 

In particular the present invention provides a method 
of constructing a molecular chimaera as herein defined, the 
15 method comprising ligating a DNA sequence encoding the 
coding sequence and poiyadenylation signal of the gene for 
a heterologous enzyme (e.g. non-mammalian CD) to a CEA TRS 
(e.g., promoter sequence and enhancer sequence). 

* 

20 These molecular chimaeras can be delivered to the 

target tissue or cells by a delivery system. For 
administration to a host (e.g., mammal or human), it is 
necessary to provide an efficient in vivo delivery system 
that stably incorporates the molecular chimaera into the 

25 cells. Known methods utilize techniques of calcium 
phosphate transf ection, electr operation , microinjection, 
liposomal transfer, ballistic barrage, DNA viral infection 
or retroviral infection. For a review of this subject see 
Biotechnigues 6., No. 7, (1988) . 

30 

The technique of retroviral infection of cells to 
integrate artificial genes employs retroviral shuttle 
vectors which are known in the art (Miller A.D., Buttimore 
C. Mol. Cell. Biol. 6, 2895-2902 (1986)). Essentially, 
35 retroviral shuttle vectors (retroviruses comprising 
molecular chimaeras used to deliver and stably integrate 
the molecular chimaera into the genome of the target cell) 
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are generated using the DNA form of the retrovirus 
contained in a plasnid. These plasmids also contain 
sequences necessary for selection and growth in bacteria. 
Retroviral shuttle vectors are constructed using standard 
5 molecular biology techniques well known in the art. 

Retroviral shuttle vectors have the parental endogenous 
retroviral genes (e.g., crag , col and env) removed from the 
vectors and the DNA sequence of interest is inserted, such 
as the molecular chimaeras that have been described. The 
10 vectors also contain appropriate retroviral regulators 
sequences fcr viral encapsidation, proviral insertion into 
the target genome, message splicing , termination and 
polyadenyiaticn. Retroviral shuttle vectors have been 
derived frcn the Moloney murine leukaemia virus (Mo-MLV) 
15 but it will be appreciated that other retroviruses can be 
used such as the closely related Moloney murine sarcoma 
virus. Other DNA viruses may also prove to be useful as 
delivery systems. The bovine papilloma virus (BPV) 
replicates extrachromcsomally, so that delivery systems 
20 based on BPV have the advantage that the delivered gene is 

maintained in a nonintegrated mauier. 

Thus according to a further aspect of the present 
invention there is provided a retroviral shuttle vector 
25 comprising the molecular chimaeras as hereinbefore 
defined. 

The advantages of a retroviral-mediated gene transfer 
system are the high efficiency of the gene delivery to the 
30 targeted tissue or cells, sequence specific integration 
regarding the viral genome (at the 5' and 3' long terminal 
repeat (LTR) sequences) and little rearrangements of 
delivered DNA compared to other DNA delivery .systems. 

35 Accordingly in a preferred embodiment of the present 

invention there is provided a retroviral shuttle vector 
comprising a DNA sequence comprising a 5' viral LTR 
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sequence, a c is -acting psi-encapsidation sequence, a 
molecular chimaera as hereinbefore defined ar.d a 3' viral 
LTR sequence. 

5 In a preferred embodiment , and to help eliminate non- 

tissue-specific expression of the molecular chimaera, the 
molecular chimaera is placed in opposite transcriptional 
orientation to the 5' retroviral LTR, In addition, a 
dominant selectable marker gene may also be included that 

10 is transcriptionally driven. .from the 5' LTR sequence. Such 
a dominant selectable marker gene may be the bacterial 
neomycin-resistance gene NZO (aminoglycoside 3 ' phospho- 
transferase type II) , which confers on eukarvotic cells 
resistance to the neomycin analogue Geneticin (antibiotic 

15 G418 sulphate; registered trademark of GIBCO) . The NEO 
gene aids in the selection of packaging cells that contain 
these sequences. 

The retroviral vector is preferably based on the 
20 Moloney murine leukaemia virus but it will be appreciated 
that other vectors may be used. Vectors containing a NEO 
gene as a selectable marker .have been described, for 
example, the N2 vector (Eglitis M.A. , Kant off P., Gilboa 
E. , Anderson W.F. Science 22£, 1395-1398 (1985)). 

25 

A theoretical problem associated with retroviral 
shuttle vectors is the potential of retroviral long 
terminal repeat (LTR) regulatory sequences 
transcriptionally activating a cellular oncogene at the 

30 site of integration in the host genome. This problem may 
be diminished by creating SIN vectors. SIN vectors are 
self-inactivating vectors that contain a deletion 
comprising the promoter and enhancer regions in the 
retroviral LTR. The LTR sequences of SIN vectors do not 

35 . transcriptionally activate 5' or 3' genomic sequences. 'The 
transcriptional inactivation of the viral LTR sequences 
diminishes insertional activation of adjacent target cell 
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DNA sequences and also aids in the selected expression of 
the delivered molecular chimaera. SIN vectors are created 
by removal of approximately 299 b? in the 3' viral LTR 
sequence (Gilboa E. , Eglitis P. A., Xantoff P.ff. , Anderson 
5 w.F. Bio techniques 4, 504-512 (1935)). 

Thus preferably the retroviral shuttle vectors of the 
present invention are SIN vectors. 

10 Since the parental retroviral cac , pol , and env genes 

have been removed from these shuttle vectors, a helper 
virus system may be utilised to provide the cac , ool, and 
env retroviral gene products in trans to package or 
encapsidate the retroviral vector into an infective virion. 

15 This is accomplished by utilising specialised "packaging" 
cell lines, which are capable of generating infectious, 
synthetic virus yet are deficient in the ability to produce 
any detectable wild-type virus. In this way the artificial 
synthetic virus contains a chimaera of the present 

20 invention packaged into synthetic artificial infectious 
virions free of wild-type helper virus. This is based on 
the fact that the helper virus that is stably integrated 
into the packaging cell contains the viral structural 
genes, but is lacking the psi-site, a cis-acting regulatory 

25 sequence which must be contained in the viral genomic RNA 
molecule for it to be encapsidated into an infectious viral 
particle. 

Accordingly, in a still further aspect of the present 
30 invention, there is provided an infective virion comprising 
a retroviral shuttle vector, as hereinbefore described, 
said vector being encapsidated within viral proteins to 
create an artificial, infective, replication-defective, 
retrovirus . 

35 

In a another aspect of the present invention there is 
provided a method for producing infective virions of the 
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present invention by delivering the artificial retroviral 
shuttle vector comprising a molecular chisaera of the 
invention, as hereinbefore described, into a packaging cell 
line . 

5 

The packaging cell line may have stably integrated 
vithin it a helper virus lacking a psi-sita and other 
regulatory sequence, as hereinbefore described, or, 
alternatively, the packaging cell line may be engineered so 

10 as to contain helper virus, structural genes within its 
genome. In addition to removal of the psi-sita, additional 
alterations can be made to the helper virus LT3. regulatory 
sequences to ensure that the helper virus is net packaged 
in virions and is blocked at the level cf reverse 

15 transcription and viral integration. Alternatively, helper 
virus structural genes (i.e., crac , ool and env) may be 
individually and independently transferred into the 
packaging cell line. Since these viral structural genes 
are separated within the packaging cell's genone, there is 

20 little chance of covert recombinations generating wild-type 
virus. 

Tiie present invention also provides a packaging cell 
line comprising an infective virion, as described 
25 hereinbefore, said virion further comprising a retroviral 
shuttle vector. 

The present invention further provides for a packaging 
cell line comprising a retroviral shuttle vector as 
30 described hereinbefore. 

In addition to retroviral-mediated gene delivery of 
the chimeric, artificial, therapeutic gene, other gene 
delivery systems known to those skilled in the art caij be 
35 used in accordance with the present invention. These other 

gene delivery systems include other viral gene delivery 
systems known in the art, such as the adenovirus delivery 
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systems . 

Non-viral delivery systems can be utilized in 
accordance with the present invention as well. For 
example, liposomal delivery systems can deliver the 
therapeutic gene to the tumor site via a liposome. 
Liposomes can be modified to evade metabolism and/or to 
have distinct targeting mechanisms associated with them. 
For example, liposomes which have antibodies incorporated 
into their structure, such- as antibodies to CIA, can have 
targeting ability to CZA-positive cells. This will 
increase both the selectivity of the present invention as 
well as its ability to treat disseminated disease 
(metastasis) . 

Another gene delivery system which can be utilized 
according to the present ■ invention is receptor-mediated 
delivery, wherein the gene of choice is incorporated into 
a ligand which recognizes a specific cell receptor. This 
system can also deliver the gene to a specific cell type. 
Additional modifications can be made to this receptor- 
mediated delivery system, such as incorporation of 
adenovirus components to the gene so that the gene is not 
degraded by the cellular lysosomal compartment after 
internalization by the receptor. 

The infective virion or the packaging cell line 
according to the invention may be formulated by techniques 
well known in the art and may be presented as a formulation 
(composition) with a pharmaceutically acceptable carrier 
therefor. Pharmaceutically acceptable carriers, in this 
instance physiologic aqueous solutions, may comprise liquid 
medium suitable for use as vehicles to introduce the 
infective virion into a host. An example of such a carrier 
is saline. The infective virion or packaging cell line may 
be a solution or suspension in such a vehicle. Stabilizers 
and antioxidants and/or other excipients may also be 
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present in such pharmaceutical formulations (compositions) , 
which may be administered to a mammal by any conventional 
method (e.g., oral or parenteral routes). In particular, 
the infective virion may be administered by intra-venous or 
5 intra-arterial infusion. In the case of treating hepatic 
metastatic CRC, intra-hepatic arterial infusion may be 
advantageous. The packaging cell line can be administered 
directly to the tumor cr near the tumor and thereby produce 
infective virions direc-ly at or near the turner site. 

10 

Accordingly, the present invention provides a 
pharmaceutical formula-ion (composition) comprising an 
infective virion or packaging cell line according to the 
invention in admixture vith a pharmaceutical!/ acceptable 
15 carrier. 

Additionally, the present invention provides methods 
of making pharmaceutical formulations (compositions) , as 
herein described, comprising mixing an artificial infective 
20 virion, containing a molecular chimaera according to the 
invention as described hereinbefore, vith a 
pharmaceutical^ acceptable carrier. 

The present invention also provides methods of making 
25 pharmaceutical formulations (compositions) , as herein 
described, comprising mixing a packaging cell line, 
containing an infective virion according to the invention 
as described hereinbefore, with a pharmaceutical^ 
acceptable carrier. 

30 

Although any suitable compound that can be selectively 
converted to a cytotoxic or cytotostatic metabolite by the 
enzyme cytosine deaminase may be utilised, the preferred 
compound for use according to the invention is 5-FC /# in 
35 particular for use in treating colorectal carcinoma (CHC) , 
metastatic colorectal carcinoma, or hepatic CRC metastases. 
5-FC, which is non-toxic and is used as an antifungal, is 
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converted by CD into the established cancer therapeutic 5- 
FU. 



Any agent that can potentiate the antitumor effects of 
5 5-FU can also potentials the antitumor effects of 5-FC 

since, when used according to the present invention, 5-FC 
is selectively converted to 5-FU. According to another 
aspect of the present invention, agents such as leucovorin 
and levemisoi, which car. potentiate the antitumor effects 
10 of 5-FU, can also be used in. combination with 5-FC when 5- 

FC is used according to the present invention- Other 
agents which can potentiate the antitumor effects of 5-FU 
are agents which block the metabolism 5-FU. Examples of 
such agents are 5-substituted uracil derivatives, for 
15 example, 5-ethynyluracil and 5-brcmvinyluracil 
(PCT/GB91/01650 (WO 92/04901) ; Cancer Research ±6, 1094, 
(198 6) which are incorporated herein by reference in their 
entirety) . Therefore, a further aspect of the present 
invention is the use of an agent which can potentiate the 
20 antitumor effects of 5-FU, for example, a 5-substituted 

uracil derivative such as 5-ethynyluracil or 5- 
bromvinyluracil in combination with 5-FC when 5-FC is used 
according to the present invention. The present invention 
further includes the use of agents which are metabolised in 
25 vivo to the corresponding 5-substituted uracil derivatives 
described hereinbefore (see Biochemical Pharmacology 38 . 
2885, (1989) which is incorporated herein by reference in 
its entirety) in combination with 5-FC when 5-FC is used 
according to the present invention • 

30 

5-FC is readily available (e.g., United States 
Biochemical, Sigma) and well known in the art. Leucovorin 
and levemisol are also readily available and well known in 
the art. 

35 

Two significant advantages of the enzyme /prodrug 
combination of cytosine deaminase/ 5-f luorocytosine and 
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further aspects of the invention are the following: 

1. The metabolic conversion of 5-FC by CD produces 5-FC 
which is the drug of choice in the treatment of many 

5 different types of cancers, such as colorectal carcinoma. 

2. The 5-FU that is selectively produced in one cancer 
cell can diffuse out of that cell and be taken up by both 
non-facilitated diffusicn and facilitated diffusion into 

10 adjacent cells. This prcduqes a neighbouring cell killing 
effect. This neighbour cell killing effect alleviates the 
necessity for delivery cf the therapeutic molecular chimera 
to every tumor cell. Rather, delivery of the molecular 
chimera to a certain percentage of tumor cells can produce 

15 the complete eradication of all tumor cells. 

The amounts and precise regimen in treating a mammal, 
will of course be the responsibility of the attendant 
physician, and will depend on a number of factors including 

20 the type and severity of the condition to be treated. 

However, for hepatic metastatic CRC, an intrahepatic 
arterial infusion of the artif icial infective virion at a 
titer of between 2 x 10 5 and 2 x 10 7 colony forming units 
per ml (CRJ/ml) infective virions is suitable for a typical 

25 tumour. Total amount of virions infused will be dependent 
on tumour size and are preferably given in divided doses. 

Likewise, the packaging cell line is administered 
directly to a tumor in an amount of between 2 x 10 5 and 2 x 
30 10 7 cells. Total amount of packaging cell line infused will 
be dependent on tumour size and is preferably given in 
divided doses. 

Prodrug treatment - Subsequent to infection with. the 
35 infective virion, certain cytosine compounds (prodrugs:, of 

5-FU) are converted by CD . to cytoxic or cytostatic 
metabolites (e.g. 5-FC is converted to 5-FU) in target 
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cells. The above mentioned prodrug ccnpounds are 
administered to the host (e.g. mammal or human) between six 
hours and ten days, preferably between one ar.d five days, 
after administration of the infective virion. 

5 

The dcse of 5-FC to be given will advantageously be in 
the range 10 to 500 mg per kg body weight of recipient per 
day, preferably 50 to 500 ag per kg bodyweight of recipient 
per day, mere preferably 50 to 250 mg per kg bedyweight of 

10 recipient per day, and most. .preferably 50 to 150 mg per kg 
body weigh:: of recipient per day. The mode of 
administration of 5-FC in humans are well knewn to those 
skilled in the art. Oral administration and/or constant 
intravenous infusion of 5-?C are anticipated by the instant 

15 invention to be preferable. 

The doses and mode of administration of leucovorin and 
levemisol to be used in accordance with the present 
invention are well known or readily determined by those 
20 clinicians skilled in the art of oncology. 

The dose and mode of administration of the 5- 
substituted uracil derivatives can be determined by the 
skilled oncologist. Preferably, these derivatives sire 

25 given by intravenous injection or orally at a dose of 
between 0.01 to 50 mg per kg body weight of the recipient 
per day, particularly 0.01 to 10 mg per kg body weight per 
day, and more preferably 0.01 to 0.4 mg per kg bodyweight 
per day depending on the derivative used. An alternative 

30 preferred administration regime is 0.5 to 10 mg per kg body 
weight of recipient once per week. 

The following examples serve to illustrate the present 
invention but should not be construed as a limitation 
35 thereof. In the Examples reference is made to the Figures 
a brief description of which is as follows: 
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Figure 1: Diagram of CEA phage clones. The 
overlapping clones lambdaCEAl, lambdaCEA7, and lambdaCEAS 
represent an approximately 26 Jcb region of CEA genomic 
sequence. The 11,288 zp HindIII-Sau3A fragment that was 
5 sequenced is represented by the heavy line under 
lambdaCEAl. The 3 774 bp Hindlll-Hindlll fragment that was 
sequenced is represented by the heavy line under 
lambda CEA7 . The bent arrows represent the transcription 
start point for CEA mSNA. The straight arrows represent 
10 the oligonucleotides CR15 and CR16. H, Hindlll; S, SstI; 

3, BamHI; E, EcoRI; X, Xial. 

Figure 2: Restriction map of part of lambdaCEAl. The 
arrow head represents the approximate location of the 
15 transcription imitation point for CEA mRNA. Lines below 
the map represent the CEA inserts of pBS+ subclones. These 
subclones are convenient sources for numerous CEA 
restriction fragments. 

20 DNA sequence of the 11,288 bp Hindlll to 

i.'au3A fragment of lambdaCEA7 (SEQ ID NO: 1). Sequence is 
numbered with the approximate transcription imitation point 
for CEA mRNA as 0 (this start site is approximate because 
there is some slight variability in the start site among 

25 individual CEA transcripts) . The translation of the first 
exon is shown. Intron 1 extends from +172 to beyond +592. 
Several restriction sites are shown above the sequence. In 
subclone 109-3 the sequence at positions +70 has been 
altered by site-directed mutagenesis in order to introduce 

30 Hindlll and EcoRI restriction sites. 

DNA sequence of the 3774 bp Hind III to 
Hindlll fragment of lambda CEA7 (SEQ ID NO: 2). 

35 

Figure 3 : Mapplot of 15,056 bp Hindlll to Sau3A 
fragment from CEA genomic DNA showing consensus sequences. 
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Schematic representation of some of the consensus sequences 
found in the CEA sequence of Seq IDs 1 and 2. The 
consensus sequences shcvn here are from the transcriptional 
dictionary of Locker and Buzard (DNA Sequence l, 3-11 
5 (1990)), The lysozynai silencer is coded B13. The las- 
line represents 90% homology to the topoisomerase II 
cleavage consensus • 

Figure 4: Cloning scheme for CEA constructs extending 
10 from -299 bp to +69 bp. 

Figure 5A; Cloning scheme for CEA constructs 
. extending from -10.7 kb to +69 bp. 

15 Figure 5B: Coordinates for CEA sequence present in 

several CEA/ lucif erase clones ♦ CEA sequences were cloned 
into the multiple cloning region of pGL2-Basic (Promega 
Corp . ) by standard techniques ♦ 

20 

Fig-ores 5C and 5D: Transient lucif erase assays. 
Transient transf ections and luciferase assays were 
performed in quadruplicate by standard techniques using 
DOTAP (Boehringer Mannheim, Indianapolis, IN, USA) , 

25 luciferase assay system (Promega, Madison, WI, USA) , and 
Dynatech luminometer (Chantilly, VA, USA) . CEA-positive 
cell lines included LoVo (ATCC #CCL 229) and SW1463 (ATCC 
#CCL 234) . CEA-negative cell lines included HuH7 and Hep3B 
(ATCC #HB 8064) . C. Luciferase activity expressed as the 

30 percent of pGL2-Control plasmid activity. D. Luciferase 
activities of LoVo and SW1463 expressed as fold increase 
over activity in Hep35. 

Example 1 

35 

Construction of transcriptional regulatory secruence of 

carcinoembrvonic anticen/cvtosine deaminase molecular 
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chimaera 

£J — Cloning and isol ation of the transcriptional rsaulatnrv 
secruence of the carcincembrvonic antiaen gens 

5 

CEA genomic clones vere identified and isolated fron 
the human chromosome 13 genomic library LL19NL01, ATCC 
#57766, by standard techniques (Richards et ai. , Cancer 
Research, = 0, 1521-1527 (1990) which is herein incorporated 
10 by reference in its entirety) . The CEA clones were 
identified by plague hybridization to 32 p end-labelled 
oligonucleotides CR15 and CR16. CR15, 5'- 
CCCTGTGATCTCCAGGACAGCTCAGTCTC-3 ' (SZQ ID NO: 3), and CR16, 
5 ' -GTTTCCTGAGTGATGTCTGTGTGCAATG-3 ' ( SEQ ID NO ; 4 ) ,' 
15 hybridize to a 5' non-transcribed region of CEA that has 
little homology to other members of the CEA gene family. 
Phage DNA was isolated from three clones that hybridized to 
both oligonucleotide probes. Polymerase chain reaction, 
restriction mapping, and DNA sequence analysis confirmed 
that the three clones contained CEA genomic sequences. The 
three clones are designated lambdaCEAl, lambdaCEAS, and 
lambdaCEA7 and have inserts of approximately 13.5, 16.2, 
and 16.7 3cb respectively. A partial restriction map of the 
three overlapping clones is shown in Figure 1. 



20 



25 



30 



35 



Clone lambdaCEAl was initially chosen for extensive 
analysis. Fragments isolated from lambdaCEAl were subcloned 
using standard techniques into the plasmid pBS+ (Stratagene 
Cloning Systems, La Jolla, CA, USA) to facilitate 
sequencing, site-directed mutagenesis, and construction of 
chimeric genes. The inserts of some clones are represented 
in Figure 2. The complete DNA sequence of a 11,288 bp 
HindIXI/Sau3A restriction fragment from lambdaCEAl ( 

SEQ ID NO: 1) was determined by the dideoxy sequencing 
method using the dsDNA Cycle Sequencing System from Life 
Technologies, Inc. and multiple oligonucleotide primers. 
This sequence extends from -10.7 kb to +0.6 kb relative to 
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the start; site of CEA The sequence of 3774 base 

pair Hindlll restriction fragment from lambdaCZAl was also 
determined ( SZQ ID NO: 2) . This sequence extends 

from -14.5 l<h to -10.7 kb relative to the start site of CEA 
5 mRNA. This KindZII fragment is present in plasmid pCRl45. 



To determine important transcriptional regulatory 
sequences various fragments of CEA genomic DNA sure linked 
to a reporter gene such as lucif erase or chloramphenicol 
10 acetyltransf erase. Various fragments of CEA genomic DNA are 
tested to determine the optimized, cell-type specific TRS 
that resulrs in high level reporter gene expression in CEA- 
positive cells but not in CEA-negative cells. The various 
reporter constructs, along with appropriate controls, are 
15 transfected into tissue culture cell lines that express 
high, low, or no CEA. The reporter gene analysis identifies 
both positive and negative transcriptional regulatory 
sequences. The optimized CEA-specific TRS is identified 
through the reporter gene analysis and is used to 
20 specifically direct the expression of any desired linked 

coding sequence, such as CD or VZV TX, in cancerous cells 
that express CEA. The optimized CEA-specific TRS, as used 
herein, refers to any DNA construct that directs suitably 
high levels of expression in CEA positive cells and low or 
25 no expression in CEA-negative cells. The optimized CEA- 
specific TRS consists of one or several different fragments 
of CEA genomic sequence or multimers of selected sequences 
that are linked together by standard recombinant DNA 
techniques. It will be appreciated by those skilled in the 
30 art that the optimized CEA-specific TRS may also include 
some sequences that are not derived from the CEA genomic 
sequences shown in seq IDs "1* and 2. These other sequences 
may include sequences from adjoining regions of the CEA 
locus, such as sequences from the introns, or sequences 
35 further upstr am or downstream from the sequenced DNA shown 

in Seq IDs l and 2; or they could include transcriptional 
control elements from other genes that when linked to 
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selected CIA sequences result in the desired CEA-specific 
regulation. 



The CIA sequence of Seq IDs 1 and 2 were computer 
5 analyzed for characterized consensus sequences which have 
been associated with gens regulation. Currently not enough 
is known about transcriptional regulatory sequences tc 
accurately predict by sequence alone whether a seouence 
will be functional. However, computer searches for 
10 characterized consensus sequences can help identifv 
transcriptional regulatory sequences in uncharacterizec 
sequences since many enhancers and promoters consist of 
unique combinations ar.d spatial alignments of several 
characterized consensus sequences as well as other 
15 sequences. Since not all transcriptional regulatory 
sequences have been identified and not all sequences that 
are identical to characterized consensus sequences are 
functional, such a computer analysis can only suggest 
possible regions of DNA that may be functionally important 
20 for gene regulation. 

Some examples of the consensus sequences that are 
present in the CZA sequence are shown in 

Figure 3 . Four copies of a lysozymal silencer consensus 
sequences have been found in the CEA sequence. Inclusion of 
one or more copies of this consensus sequence in the 
molecular chimera can help optimize CEA-specific 
expression. A cluster of topoisomerase II cleavage 
consensus identified approximately 4-5 kb upstream of the 
CEA transcriptional start suggest that this region of CEA 
sequence may contain important transcriptional regulatory 
signals that may help optimize CEA-specific expression. 

The first fragment of CEA genomic sequence analyzed for 
transcriptional activity extends from -299 to +69, but it 
is appreciated by those skilled in the art that other 
fragments are tested in order to isolate a TRS that directs 



25 
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strong expression ir. CEA-positive cells but little 
expression in CZA-negarive cells. As diagrammed in Figure 
4 the 943 hp Smal-HincIII fragment of plasmid 39-5-5 was 
subcloned into the Smal -Hindlll sites of vector pBS+ 
5 (Statagene Cloning Systems) creating plasmid 96-n. 

Single-stranded DNA was rescued from cultures of XLl-blue 
96-11 using an Ml 3 helper virus by standard techniques. 
Oligonucleotide CR70, 5 ' - 
CCTGGAACTCAAGCTTGAATTC7CCACAGAGGAGG-3 ' (SEQ ID NO: 5), was 
10 used as a primer for oligonucleotide-directed mutagenesis 
to introduce Kindlll and EcoRI restriction sites at +65. 
Clone 109-3 was isolated from the mutagenesis reaction and 
was verified by restriction and DNA sequence analysis to 
contain the desired changes in the DNA sequence. CEA 
15 genomic sequences -299 to +69, original numbering Figure 3, 
were isolated from 109-3 as a 381 bp EcoRI /Hindlll 
fragment. Plasmid pRc/CMV (Invitrogen Corporation, San 
Diego, CA, USA) was restricted with Aatll and Hindlll and 
the 4.5 kb fragment was isolated from low melting point 
20 agarose by standard techniques. The 4.5 kb fragment of 
pRc/CMV was ligated to the 381 bp fragment of 109-3 using 
T4 DNA ligase. During this ligation the compatible Hindlll 
ends of the two different restriction fragments were 
ligated. Subsequently the ligation reaction was 

25 supplemented with the four deoxynucleotides , dATP, dCTP, 

dGTF, and dTTP, and T4 DNA polymerase in order to blunt the 
non-compatible Aatll and EcoRI ends. After incubating, 
phenol extracting, and ethanol precipitating the reaction, 
the DNAs were again incubated with T4 DNA ligase. The 
30 resulting plasmid, pCR92, allows the insertion of any 
desired coding sequence into the unique Hindlll site 
downstream of the CEA TRS, upstream from a polyadenylation 
site and linked to a dominant selectable marker. The 
coding sequence for CD or other desirable effector or 
35 reporter gene, when inserted in the correct orientation 
into the Hindlll site, are transcriptionally regulated by 
the CEA sequences and are preferably expressed in cells 
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that express CEA but.nci: in cells that do not express CEA. 



In order to determine the optimized CEA TRS other 
reporter gene constructs containing various fragments of 
5 CEA genomic sequences are made by standard techniques from 
DNA isolated from any of the CEA genomic clones (Figures 1, 
2, 4, and 5) . DNA fragments extending from the Kindlll 
site introduced at position +65 (original numbering Figure 
3A) and numerous different upstream sites are isolated and 

10 cloned into the unique mt Hindlll site in plasmid 
pSV0ALdelta5' (De Wet, J.R.', et al. Mol. Cell. Biol., 7, 
725-737 (1987) which is herein incorporated by reference in 
its entirety) or any similar reporter gene plasmid to 
construct lucif erase reporter gene constructs, Figures 4 

15 and 5. These and similar constructs are used in transient 
expression assays performed in several CEA-positive and 
CEA-negative cell lines to determine a strong, CEA-positive 
cell-type specific TRS. Figures 53, 5C, and 5D show the 
results obtained from several CEA/ lucif erase reporter 
20 constructs. The optimized TRS is used to regulate the 

expression of CD or other desirable gene in a cell-type 
specific pattern in order to be able to specifically kill 
cancer cells. The desirable expression cassette is added 
to a retroviral shuttle vector to aid in delivery of the 
25 expression cassette to cancerous tissue. 

Strains containing plasmids 39-5-5 and 39-5-2 were 
deposited at the ATCC tinder the Budapest Treaty with 
Accession No. 68904 and 68905, respectively. A strain 
30 containing plasmid pCR92 was deposited with the ATCC under 
the Budapest Treaty with Accession No. 68914. A strain 
containing plasmid pCR145 was deposited at the ATCC under 
the Budapest Treaty with Accession No. 69460. 



35 



Clonino and isolation of the E. coli gene encoding 



cvtosine deaminase (CD) 
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The cloning, sequencing and expression of E. coli CD 
has already been published (Austin & Huber, Molecular 
Pharmacology, 42, 380 - 337 (1993) the disclosure of which 
is incorporated herein by reference) . A positive genetic 
5 selection vas designed for the cloning of the ccdA gene 
from Z . ccli. The selection took advantage of the fact 
that Z. eeli is only able to metabolize cytcsine via CD. 
3ased on this, an E. call strain was constructed that could 
only utilize cyrosine as a pyrimidine source when cytosine 

10 deaminase vas provided in. .trans. This strain, BA101, 
contains a deletion of the codAB operon and a mutation in 
the pyrF gene. The szrain was created by transducing a 
pyrF mutation (obtained from the E. coli strain X3 2 (Z. 
coli Genetic Stock Cenrar, New Haven, CT, USA) ) into the 

15 strain MBM7007 (W. Dallas, Burroughs Wellcome Co., NC, USA) 

which carried a deletion of the chromosome from lac to 
argF. The pyrF mutation confers a pyrimidine requirement 
on the strain, BA101. In addition, the strain is unable to 
metabolize cytosine due to the codAB deletion. Thus, BA101 

20 is able to grow on minimal medium supplemented with uracil 
but is unable to utilize cytosine as the sole pyrimidine 
source . 

The construction of BA101 provided a means for 
25 positive selection of DNA fragments encoding. The strain, 
BA101, was transformed with plasmids carrying inserts from 
the E. coli chromosome and the transformants were selected 
for growth on minimal medium supplemented with cytosine. 
Using this approach, the transf ormants were screened for 
30 the ability to metabolize cytosine indicating the presence 
of a DNA fragment encoding CD. Several sources of DNA 
could be used for the cloning of the codA gene: l) a 
library of the r. coli chromosome could be purchased 
commercially (for example from Clontech, Palo Alto, CA, USA 
35 or Stratagene, La Jolla, CA, USA) and screened; 2) 
chromosomal DNA could be isolated from E. coli, digested 
with various restriction enzymes and ligated and plasmid 
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DNA with compatible ends before screening; and/or 3) 
bacteriophage lambda clones containing mapped E, coli 
chromosomal DNA inserts could be screened, 

5 Bacteriophage lar±ca clones (Y. Kohara, National 

Institute of Genetics, Japan) containing DNA inserts 
spanning the 6-8 minute region of the E. coli chromosome 
were screened for the ability to provide transient 
complementation of the codA defect. Two clones, 13 7 and 
10 123 were identified in \.this manner. Large-scale 
preparations of DNA fron these clones were isolated frcn 
500 ml cultures. Restriction enzymes were used to generate 
DNA fragments ranging in size from 10-12 kilcbases. The 
enzymes used were EccRZ, EcoRI and SajnHI, and EcoRX and 
15 EindXXX. DNA fragmenrs of the desired size were isolated 
from preparative agarose gels by electroelution. The 
isolated fragments were ligated to pBR3 22 (Gibco BRL, 
Gaithersburg, MD, USA) with compatible ends. The resulting 
ligation reactions were used to transform the E. coli 
20 strain, DH5a (Gibco BRL, Gaithersburg, MD, USA) . This step 
was used to amplify the recombinant plasmids resulting from 
the ligation reactions. The plasmid DNA preparations 
isolated from the ampicillin-resistant DH5a transf ormants 
were digested with the appropriate restriction enzymes to 
25 verify the presence of insert DNA. The isolated plasmid 
DNA was used to transform BA101. The transformed cells 
were selected for resistance to ampicillin and for the 
ability to metabolize cytosine. Two clones were isolated 
pEAOOl and pEA002. The plasmid pEAOOl contains an 
30 approximately 10.8 kb £coRI-BamHI insert while pEA002 
contains an approximately 11.5 kb EcoRI-HindXXX insert. 
The isolated plasmids were used to transform BA101 to 
ensure that the ability to metabolize cytosine was the 
result of the plasmid and not due to a spontaneous 
35 chromosomal mutation. 



A physical map of the pEAOOl DNA insert was generated 
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using restricrion enzymes. Deletion derivatives of pEAOOi 
were constructed based on this restriction nap. The 
resulting plasmids were screened for the ability to allow 
BA101 to metabolize cytosine. Using this approach, the 
5 ccdA gene was localized to a 4.8 kb ZcoRI-Sglll fragment. 

The presence of codA within these inserts was verified by 
enzymatic assays for CD activity. In addition, cell 
extracts prepared for enzymatic assay were also examined by 
polyacrylanide' gel electrophoresis. Cell extracts that 
10 were positive for enzymati'c_ activity also had a protein 
band migrating with an apparent molecular weight of 52,000. 



The DNA sequence of both strands was determined for a 
15 1634 bp fragment. The sequence determination began at the 
PstI site and extended to PvuII site thus including the 
codA coding domain. An open reading frame of 1283 
nucleotides was identified. The thirty amino terminal 
amino acids were confirmed by protein sequencing. 
20 Additional internal amino acid sequences were generated 
from CNBr-digestion of gel-purified CD. 

A 200 bp PstI fragment was isolated that spanned the 
translational start codon of codA. This fragment was 

25 cloned into pBS*. Single-stranded DNA was isolated from 30 
ml culture and mutanized using a custom oligonuclotide BA22 
purchased from Synthecell Inc., Rockville, MD, USA and the 
oligonucleotide-directed mutagenesis kit (Amersham, 
Arlington Heights, IL, USA) . The base changes result in 

30 the introduction of an Hindlll restriction enzyme site for 
joining of CD with CEA TRS and in a translational start 
codon of ATG rather than GTG. The resulting 90 bp ffindlll- 
Pstl fragment is isolated and ligated with the remainder of 
the cytosine deaminase gene. The chimeric CEA TRS/ cytokine 

35 deaminase gene is created by ligating the rindlll-PvuII 
cytosine deaminase-containing DNA fragment with the CEA TRS 
sequences. 
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The strain BA101 ar.d the plasmids, pEAOOl and pEA003, 
vere deposited with A7CC under the Budapest Treaty with 
Accession Ncs. 55299, 63916", and 63915 respectively. 

5 C) Construction of transcriptional regulator*/ sequence of 
carcinoembrvonic -antigen /cytosine deaminase molecular 
chimera 

A 150S hp Hindlll/PvuII fragment containing the coding 
10 sequence fcr cytosine deaminase is isolated from the 
plasmid containing the full length CD gene of Example 13 
that has been altered tc contain a Hindlll restriction site 
just 5' of -the initiation codon. Plasmid pCR92 contains 
CEA sequences -299 to +59 immediately 5' to a unique 
15 Hindlll restriction site and a polyadenylation signal 3' to 
a unique Aral restriction site (Example 1A, Figure 4) . 
pCR92 is linearised with Apal, the ends are blunted using 
dNTPs and T4 DNA polymerase, and subsequently digested with 
Hindlll. The pCR92 HincIII/Apal fragment is ligated to the 
20 1508 bp Hindlll/PvuII fragment containing cytosine 

deaminase. Plasmid pCEA-l/codA, containing CD inserted in 
the appropriate orientation relative to the CEA TRS and 
polyadenylation signal is identified by restriction enzyme 
and DNA sequence analysis. 

25 

The optimized CEA-specific TRS, the coding sequence 
for CD with an ATG translation start, and a suitable 
polyadenylation signal are joined together using standard 
molecular biology techniques. The resulting plasmid, 
30 containing CD inserted in the appropriate orientation 
relative to the optimized CEA specific TRS and a 
polyadenylation signal is identified by restriction enzyme 
and DNA sequence analysis. 

35 Example 2 

Construction of a retroviral shuttle vector construct 
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containing the molecular chimera of Example 1 

The retroviral shuttle vector pL-CEA-l/codA is constructed 
by ligating a suitable restriction fragment containing the 
optimized CIA TRS/codk molecular chimera including the 
polyadenylation signal into an appropriate retroviral 
shuttle vector, such as N2(XM5) linearised at the Xhol 
site, using standard molecular biology techniques . The 
retroviral shuttle vector pL-CEA-l/codA is characterized by 
restriction endonuclease /mapping and partial DNA 
sequencing. 

Example 3 

virus Production of Retroviral Constructs of Example 3 

The retroviral shuttle construct described in Example 2 is 
placed into an appropriate packaging cell line, such as 
PA317, by electroporation or infection. Drug resistant 
colonies, such as those resistant to G413 when using 
shuttle vectors containing the NEO gene, are single cell 
cloned by the limiting dilution method, analyzed by 
Southern blots, and titred in NIH 3T3 cells to identify the 
highest producer of full-length virus. 

Example 4 

Further data on the CEA TRS 

In addition to the plasmids shown in figure SB, the 
following combinations of regions have proved particularly 
advantageous at high level expression of the reporter gene 
in the system described in Example 1A: 
pCR177: 

(-14.5kb to -I0.6kb) + (-6.1kb to -3.9kb) + (-299b to +69b) 
PCR17 6: 

(-13.6kb to -10.6kb) + (-6.1kb to -3.9kb) + (-299b to +69b) 
pCR165: 

(-3.9kb to -6.1kb) + (4x -90b to +69b) 
pCR168: 

(-13.6kb to -10.6kb) + (4x -90b to +69b) . 
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SEQUENCE LISTING 



(1) GZNZRAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: The Wellcome Fcundation Limited 
(3) STREET: Unicorn House, 160 Euston Road 
(C) CITY: London 

(E) COUNTRY: G.3. 

(F) POSTAL CODE (ZIP): NW1 2BP 

( ii) TITLE OF INVENTION: Transcriptional Regulatory Sequence 
(iii) NUMBER OF SEQUENCES: 5 



(iv) COMPUTER READABLE FORM: 

• (A) MEDIUM TYPE: Flouov disk 

(B) COMPUTER: I3M PC compatible 

(C) OPERATING SYSTEM: ?C~DOS/MS-DOS 

(D) SOFTWARE : Parentln Release #1.0, Version #1.25 (EFO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iii) ANTI- SENSE: NO 

(v) FRAGMENT TYPE: N-terminal 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



AAGCTTAAAA 


CCCAATGGAT 


TGACAACATC 


AAGAGTTGGA ACAAGTGGAC 


ATGGAGATGT 


60 


TACTTGTGGA 


AATTTAGATG 


TGTTCAGCTA 


TCGGGCAGGA 


GAATCTGTGT 


CAAATTCCAG 


120 


CATGGTTCAG 


AAGAATCAAA 


AAGTGTCACA 


GTCCAAATGT GCAACAGTGC 


AGGGCAIAAA 


180 


ACTGTGGTGC 


ATTCAAACTG 


AGGGATATTT 


TGGAACATGA 


GAAAGGAAGG 


GATTGCTGCT 


240 


GCACAGAACA 


TGGATCATCT 


CACACATAGA 


CTTGAAAGAA AGGAGTCAAT 


CGCAGAAIAG 


300 


AAAATGATCA 


CTAATTCCAC 


CTCTATAAAG 


TTTCCAAGAG 


GAAAACCCAA 


TTCTGCTGCT 


360 


AGAGATCAGA 


ATGGAGGTGA 


CCTGTGCCTT 


CCAATGGCTG 


TGAGGGTCAC 


GGGAGTGTCA 


420 


CTTAGTGCAG 


GCAATGTGCC 


GTATCTTAAT 


CTGGGCAGGG 


CTTTCATGAG 


CACATAGGAA 


480 


TGCAGACATT 


ACTGCTGTGT 


TCATTTTACT 


TCACCGGAAA 


AGAAGAATAA 


AATCAGCCGG 


540 


GCGCGGTGGC 


TCACGCCTGT 


AATCCCAGCA 


CTTTAGAAGG 


CTGAGGTGGG 


CAGATTACTT 


6Q0 


GAGGTCAGGA 


GTTCAAGACC 


ACCCTGGCCA 


ATATGGTGAA 


ACCCCGGCTC 


TACTAAAAAT 


660 


ACAAAAATTA 


GCTGCGCATG 


GTGGTGCGCG 


CCTGTAATCC 


CAGCTACTCG 


GGAGGCTGAG 


720 
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GC7CGACAAT 7GCT7CGACC CAGGAACCAG ACG77CCAG7 GACCCAAGA7 TC7GCCAC7G 780 

CACTCCAGCT TGGGCAACAG AGCCAGAC7C 7G7AAAAAAA AAAAAAAAAA AAAAAAAAAG 840 

AAAGAAAGAA AAAGAAAAGA AAG7ATAAAA 7C7C777GGG 77AACAAAAA AAGA7CCACA 900 

AAACAAACAC CAGCTC77A7 CAAACTTACA CAACTCTCCC AGACAACAGG AAACACAAAT 960 

AC7CA77AAC TCAC7TT7G7 GGCAA7AAAA CCTTCATGTC AAAAGGAGAC CAGGACAGAA 1020 

TGAGGAAGTA AAAC7GCAGG CCC7AC7TGG G7GCAGAGAG GGAAAATCCA CAAA7AAAAC 1080 

AXTACCAGAA GGAGC7AAGA 777AC7GCA7 rGAGTTCATT CCCCAGG7A7 GCAAGGTGAT 1140 

TTTAACACCT GAAAA7CAA7 CA77GCCT77 AC7ACA7AGA CAGA77AGC7 AGAAAAAAA7 1200 

7ACAAC7AGC AGAACAGAAG CAA7TTGGCC 77CC7AAAA7 7CCACA7CA7 A7CA7CATGA 1260 

7GGAGACAG7 GCAGACSCCA A7GACAA7AA AAAGAGGGAC C7CCS7CACC CGG7AAACAT 1320 

G7CCACACAG C7CCAGCAAG CACCCG7C77 CCCAG7GAA7 CAC7G7AACC 7CCCC777AA 1380 



7CAGCCCGAG 


' GCAAGGC7GC 


: C7GCGA7CGC 


CACACAGG C7 




GGCC7CAACC 


1440 


7CCCGCAGAG 


GC7C7CC777 


GGCCACCCCA 


• 4wvu AG AGC 


A7GaGGACAG 


GGCAGAGCCC 


1500 


7C7GA7GCCC 


ACACA7GGCA 


GGAGC7GACG 


CCAGAGCCA7 


GGGGGC7GGA 


GAGCAGAGC7 


1S60 


GC7GGGG7CA 


GAGC77CC7G 


AGGACACCCA 


GGCC7AAGGG 


AAGGGAGC7C 


CC7GGA7GGG 


1620 


GGCAACCAGG 


C7CCGGGC7C 


CAACC7CAGA 


GCCCGCA7GG 


GAGGAGCCAG 


CAC7C7AGGC 


1680 


C777CC7AGG 


G7GAC7C7GA 


CGGGACCCTC 


ACACGACAGG 


A7CGC7GAA7 


GCACCCSAGA 


1740 


TGAAGGGGCC 


ACCACGGGAC 


CC7GC7C7CG 


7CGCAGATCA 


GGAGAGAG7G 


GGACACCATG 


1800 


CCAGGCCCCC 


ATGGCATCGC 


7CCGAC7GAC 


CCAGGCCAC7 


CCCC7GCA7G 


CA7CAGCC7C 


1860 


GG7AAG7C-VC 


ATGACCAAGC 


CCAGGACCAA 


7G7GGAAGGA 


AGGAAACAGC 


A7CCCC7TTA 


1920 


G7GA7GGAAC 


CCAAGG7CAG 


TGCAAAGAGA 


GGCCATGAGC 


AG77AGGAAG 


GG7GG7CCAA 


1980 


CC7ACACCAC 


AAACCA7CA7 


C7A7CA7AAG 


7AGAAGCCCT 


CC7CCA7GAC 


CCC7GCAT7T 


2040 


AAA7AAACG7 


77GTTAAA7G 


AG7CAAAT7C 


CC7CACCATG 


AGAGC7CACC 


TGTGTC7AGG 


2100 


CCCATCACAC 


ACACAAACAC 


ACACAGACAC 


ACACAGACAC 


ACACAGACAC 


ACACAGGGAA 


2160 


AGTGCAGGAT 


CCTGGACAGC 


ACGAGGCAGG 


CT7CACAGGC 


AGAGCAAACA 


GCG7CAATGA 


2220 


CCCATGCAGT 


GCCCTGGGCC 


CCATCAGC7C 


AGAGACCCTG 


TGAGGGCTGA 


GATGGGGCTA 


2280 


GGCAGGGGAG 


AGACTTAGAG AGGGTGGGGC 


C7CCAGGGAG 


GGGGC7GCAG 


GGAGC7GGGT 


2340 


ACTGCCC7CC 


AGGGAGGGGG 


C7GCAGGGAG 


C7GGGTACTG 


CCC7CCAGGG 


AGGGGGCTGC 


2400 


AGGGAGC7GG 


GTACTGCCC7 


CCAGGGAGGG 


GGC7GCAGGG 


AGC7GGG7AC 


TGCCC7CCAG 


2460 


GGAGGGGGC7 


GCAGGGACC7 


GGG7AC7GCC 


C7CCAGGGAG 


CCACGAGCAC 


TG77CCCAAC 


2520 


AGAGAGCACA 


TC7TCC7GCA 


GCAGC7GCAC 


AGACACAGGA 


CCCCCCA7GA 


C7CCCC7GGG 


2580 


CCAGGG7G7G 


GAT7CCAAA7 


77CC7GCCCC 


A77GGG7GGG 


ACCGAGG77G 


ACCC7GACAT 


264t) 


CCAAGGGGCA 


TCTG7GA77C 


CAAACTTAAA 


C7AC7G7GCC 


7ACAAAA7AG 


GAAA7AACCC 


2700 


7ACT7777C7 . 


ACTATC7CAA 


AT7CCC7AAG 


CACAAGC7AG 


CACCC7TTAA 


ATCAGGAAC7 


2760 
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TCAGTCACTC CTGGGGTCCT CZCXZGCCCZ "AG7CTGAC? TGCAGGTGCA CAGGGTCGCT 2820 

GACA7CTGTC CTTGCTCCTC CrCTTGGCTC AACTCCCGCC CCTCCTGGGG G7GACTGATG 2880 

GTCAGGACAA GGGATCC7AG AGCTGGCCCC ATGATTGACA GGAAGGCAGG ACTTGGCC7C 2940 

CATTCTGAAG ACTAGGGGTG TCAAGAGAGC TGGGCATCCC ACAGAGCTGC ACAAGATGAC 3000 

GCGGACAGAG GGTGACACAG GGCTCAGGGC TTCAGACGGG TCGGGAGGCT CAGCTGAGAG 3060 

TTCAGGGACA GACCTGAGGA GCCTCAGTGG GAAAAGAAGC ACTGAAGTGG GAAG7TCTGG 3120 

AATG~CTGG ACAACCCTSA CTGC7CTAAG GrJ^ATGCTCC CACCCCGATG TAGCCCGCAG 3180 

CACTGGACGG TCTGTGTACC TCCCCGCTGC CGATCCTCTC ACAGCCCCCG CCTCT.-.GGGA 3240 

CACAACTCCT GCCCTAACA? GCATCTTTCC TGTC7CATTC CACACAAAAG GGCCTCTGGG 3300 

GTCCCTG7TC TGCATTGCAA GGAGTGGAGG 7CACGTTCCC ACAGACCACC CAGCAACAGG 3360 

GTCC7A7GGA GGTGCGG7CA GGAGGATCAC ACSTCCCCCC ATGCGCAGGG GACTGACTCT 3420 

GGGGG7GAXG GAXTGGCC7G GAGGCCACTC GTCCCCTCTG TCCCTGAGGG GAATCTGCAC 3480 

CCTGGAGGCT GCCACA7CCC TCCTCATTCT rtCAGCTGAG GGCCCTTCTT GAAACCCCAG 3540 

GGAGGACTCA ACCCCCACTC GGAAAGGCCC AGTGTGGACG GTTCCACAGC AGCCCACCTA 3600 

AGGCCCTTGG ACACAGATCC TGAGTGAGAG AACCTTTAGG GACACAGGTG CACGGCCATG 3660 

TCCCCAGTGC CCACACAGAG CAGGGGCATC 7GGACCCTGA GTGTG7AGCT CCCGCGACTG 3720 

AACCCAGCCC TTCCCCAATG ACGTGACCCC rGGGGTGGCT CCAGGTCTCC AGTCCATCCC 3780 

ACCAAAATCT CCAGATTGAG GGTCCTCCCT 7GACTCCCTG ATGCCTCTCC AGGAGC7CCC 3840 

CCCTGAGCAA ATCTAGAGTG CAGAGGGCTG GGATTGTGGC AGTAAAAGCA GCCACATTTG 3900 

TCTCAGGAAG GAAAGGGAGG ACATGAGCTC CAGGAAGGGC GATGGCGTCC TCTAG7GGGC 3960 

GCCTCCTGTT AATGAGCAAA AAGGGGCCAG GAGAGTTGAG AGATCAGGGC TGGCCTTGGA 4020 

CTAAGGCTCA GATGGAGACG ACTGAGGTGC AAAGAGGGGG CTGAAGTAGG GGAGTGGTCG 4080 

GGAGAGATGG GAGGAGCAGG TAAGGGGAAG CCCCAGGGAG GCCGGGGGAG GGTACACCAG 4140 

AGCTCTCCAC TCCTCAGCAT TGACATTTGG GCTGGTCGTG CTAGTGGGGT TCTGTAAGTT 4200 

GXAGGGTGTT CAGCACCATC TGGGGACTCT ACCCACTAAA TGCCAGCAGG ACTCCCTCCC 4260 

CAAGCTCTAA CAACCAACAA TG7CTCCAGA CTTTCCAAAT GTCCCCXGGA GAGCAAAATT 4320 

GCTTCTGGCA GAATCACTGA TCTACGTCAG TCTCTAAAAG TCACTCATCA GCGAAATCCT 4380 

TCACCTCTTG GGAGAAGAAT CACAAGTGTG AGAGGGCTAG AAACTGCAGA CTTCAAAATC 4440 

TTTCCAAAAG AGTTTTACTT AATCAGCAGT TTGATGTCCC AGGAGAAGAT ACATTTACAG 4500 

TGTTTAGAGT TGATGCCACA TGGCTGCCTG TACCTCACAG CAGGAGCAGA GTGGGTTTTC 4560 

CAAGGGCCTG TAACCACAAC TGGAATGACA CTCACTGGGT TACATTACAA AGTGGAAXGT 4620 

CGGGAATTCT GTAGACTTTG GGAAGGG AAA TCTATGACGT CAGCCCACAG CCTAAGGCAG 4680 

TCGACAGTCC ACTTTGAGGC TCTCACCATC TAGGAGACAT CTCAGCCATG AACATAGCCA 4740 

CATCTGTCAT TAGAAAACAT GTTTTATTAA GAGCAAAAAT CTACGCTAGA ACTGCTTTAT 4800 
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GCTCTTTTTT CTCTTTArGT TCAAATTCAT ATACTTTTAG A7CA77CCTT AAAGAAGAAT 4860 

CTATCCCCCT AAG7AAACG? 7ATCACTGAC 7GGA7AGTG7 TGGTGTCTCA CTCCCAACCC 4920 

CTG7G7GG7G ACAC7GCCC7 GC77CCCCAG CZZZ5GGCCC 7C7C7GAT7C CTGAGACC7T 4980 

TGGG7GC7CC TTCAT7AGGA GGAAGAGAGG AAGGG7CTTT T7AA7ATTCT CACCA77CAC 5040 

CCATCCACCT CTTACACAC7 GGGAAGAA7C AG77GCCCAC 7C77GGATTT GA7CC7C3AA SXOO 

TTAArGACCT CTA7TTC7G7 CCCTTG7CCA 777CAACAAT G7GACAGGCC TAAGAGG7GC 5160 

CTTC7CCA7G TGATTTTTGA GGAGAAGG77 C7CAAGATAA G7777C7CAC ACC7C777GA 5220 

ATTACC7CCA CCTG7G7CCC CATCACCA77 ACCACCAGCA 777GGACCCT 7777C7S77A 5280 

G7CAGA7GC7 TTCCACC7C7 7GAGGGTGTA 7AC7G7ATGC 7C7C7ACACA GG AA7A7GCA 5340 

GAGGAAA^AG AAAAAGGGAA A7CGCATTAC 7A77CAGAGA GAAGAAGACC TTTA7S7GAA 5400 

7GAA7GAGAG 7CTAAAA7CC 7AAGAGAGCC CA7A7AAAA7 7A77ACCAGT GCTAAAACTA 5460 

CAAAAG77AC ACTAACAG7A AAC7AGAA7A A7AAAACATG CATC AC AG 7T GCTGG7AAAG 5520 

C7AAA7CAGA TATT77777C 77AGAAAAAG CA77CCATG7 G7G77GCAGX GA7GACAGGA 5580 

G7GCCC77CA GTCAATA7CC 7GCC7GTAA7 7777G7TCCC TGGCAGAATG XA77C7C777 5640 

TCXCCC777A AATCT7AAA7 GCAAAACTAA AGGCAGCTCC TGGGCCCCCT CCCCAAAGTC 5700 

AGCTGCCTGC AACCAGCCCC ACGAAGAGCA GAGGCCTGAG CT7CCC7GG7 CAAAA7AGGG 5760 

GGCTAGGGAG CTTAACC77G CTCGATAAAG C7G7G7TCCC AGAATG7CGC 7CC7G77CCC 5820 

AGGGGCACCA GCCTGGAGGG TGGTGAGCCT CAC7GG7GGC CTGA7GCTTA CCT7G7GCCC 5880 

TCACACCAGT GGTCACTGGA ACC77GAAC2. C77GGCXGTC GCCCGGATCT GCAGA7C7CA 5940 

AGAACTTCTG GAAGTCAAAT TAC7GCCCAC 77C7CCAGGG CAGATACCTG TCAACATCCA 6000 

AAACCATGCC ACAGAACCCT GCCTGGGGTC 7ACAACACAT ATGGACTGTG AGCACCAAGT 6060 

CCAGCCCTGA ATCTGTGACC ACCTGCCAAG ATCCCCCTAA CTGGGATCCA CCAATCACTG 6120 

CACATGGCAG GCAGCGAGGC TTGGAGGTGC 77CGCCACAA GGCAGCCCCA ATTTGCTGGG 6180 

AGTTTCT T GG CACCTGGTAG TGGTGAGGAG CC77GGGACC CTCAGGATTA CTCCCCTiaA 6240 

GCATAGTGGG GACCCTTCTG CATCCCCAGC AGGTGCCCCG CTCTTCAGAG CCTCTCTCTC 6300 

TGAGGTTTAC CCAGACCCCT GCACCAATGA CACCATGCTG AAGCCTCAGA CAGAGAGATG 6360 

GAGCTTTGAC CAGGAGCCGC TCTTCCTTGA GGGCCAGGGC AGGGAAAGCA GGAGGCAGCA 6420 

CCAGGAGTGG GAACACCAGT GTCTAAGCCC C7GA7GAGAA CAGGGTCGTC TCTCCCai&T 6480 

GCCCATACCA GGCCTGTGAA CAGAATCCTC C77C7GCAG7 GACAATGTCT GAGAGGACGA 6540 

CATGX77CCC AGCCTAACGT GCAGCCATGC CCATCTACCC ACTGCCTACT GCAGGACAGC 6600 

ACCAACCCAG GAGCTGGGAA GCTGGGAGAA GACATGGAAT ACCCATGGCT TC7CACC77C 66jS0 

CTCCAGXCCA GTGGGCACCA TTTATGCCTA GGACACCCAC CTGCCGGCCC CACGCTCTTA 6720 

AGAGTTAGGT CACCTAGGTG CCTCTGGGAG GCCGAGGCAG GAGAATTGCT TGAACCCGGG 6780 

AGGCAGAGGT TGCAGTGAGC CGAGATCACA CCACTGCACT CCAGCCTGGG TGACACAATC 6840 
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AGAC7C737C 7CAAAAAAAA AGAGAAAGA7 
GGGGAAGG7G GAGAGT7AA7 CA77AATAG7 
TTCTGGAAAA AAAAATA7AG 7GG7GAGGA7 
TTAA77G7AC ACTTAACA7G A77AATGTGG 
CAAGAAACAC TGGGAGAGGG AAAGCCCACC 

TccrcArrcT acccaggtac aggcccctca 

GGACA7GAGG CTTCCCAGCC AAC7GCAGG7 
GAGAG7AAAG CTGGGGGCAC AAACCTCACC 
TCTGAC777A TCTGACCCGG CCCACTGTCC 
GTCA7AAAGC CTGTCCCCAG GGCACTCTG7 
CCG77AGG7C TCCACACA7A GA7CTGACCA 
GCTCAGGGA7 CACACCAGAG AA7CAGGTAC 
GGA7GACACA GAGCAGGG77 CC7GCAGTCC 
CCA7CC7CC7 GATCCAGGCA 7CCCTGACAC 
CACA7CAGGG TCCC7CAC7C AAGCTGTCCA 

CCAC77CAC7 cttcctccct cacagggctc 

GGCAGACGCC AGTGAGCCCA GAGATGGTGA 
AACGGGAGCA GGTGAAGCCA CAGATGGGAG 
GGGCAGGAGA GGAGAGGAGG ACACAGGCTC 
TGTGAAGACA TCTCAGCAGG TGAGGCCAGG 
CCCTCA7GCA CGCACACACA GAGTGTGTGG 
TCTGG7CCCC AGCGAGTCAG AAGTGAGGTT 
ACAT7CACCT TCTCCTCATG CCCCTCTCTC 
CAAATCTCAT GTCAAATTGT AAACCCCAAT 
GGATAATGCG GGTGGATTTT CTGCTTTGAT 
CTGGTTGTTT AAAAGTGTGT AGCACCTCTC 
CTGCCATGTA AGACGTTCCT GTTTCCCCTT 
GGCCTCCCCA GGAGCAGAAG CCACTATGCT 
TAAACC7C77 TTCTTTATAA ATTACCCAGT 
CAGAC7AATA CAATCTTCTA CTCCCAGATC 
CCCCTGGGAG CATGCACAGC GCAGCCTCC7 
AAAAATCTGC ATTTGGGGAC ATCTGAXTGT 
CAGACAC7GG GGCTCACCCA CCXGAAACCT 
CCGTG77CAA TGTCTAGAGA TCAGTGTTGA 
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AGCATCAGTG 


GC7ACCAAGG 


GCTAGGGGGA 


6900 


.-.rSrtAGTTTC 


7AXG7GAGAT 


GATGAAAATG 


6960 


G7AGAATATT 


GTGAATATAA 


TTAACGGCAT 


7020 


CA7ATTTTAT 


C77A7GTATT 


TGAC7ACATC 


7080 


A7G7AAAATA 


CACCCACCCT 


AATCAGATAG 


7140 


73ACC7GCAC 


AGGAATAACT 


AAGGA7TTAA 


7200 


GCACAACATA 


AA7G7ATCTG 


CAAACAGACT 


7260 


AC7GCCAGGA 


CACACACCC7 


TCTCG7GGAT 


7320 


AGA7CTTGTT 


G7GGGATTGG 


GACAAGGGAG 


7380 


G7G AG CACAC 


GAGACC7CCC 




7440 


77 AGG CATTG 


7GAGGAGGAC 


TCTACCGCGG 


7S00 


.-.GAGAGGAAG 


ACGGGGCTCG 


AGGAGC7GAT 


7560 


ACAGG7CCAG 


C7CACCCTGG 


TGTAGG7GCC 


7620 


AGC7CCCTCC 


CGGAGCCTCC 


TCCCAGG7GA 


7680 


GAGAGGGCAG 


CACC77GGAC 


AGCGCCCACC 


7740 


AGGGCT CAGG 


GC7CAAGTCT 


CAGAACAAAT 


7800 


CAGGGCAATG 


ATCCAGGGGC 


AGCTGCCTGA 


7860 


AAGATGGTTC 


AGGAAGAAAA 


ATCCAGGAAT 


7920 


TG7GGGGCTG 


CAGCCCAGGA 


TGGGACTAAG 


7980 


7CCCATGAAC 


AGAGAAGCAG 


CTCCCACCTC 


8040 


TGC7GTGCCC 


CCAGAGTCGG 


CCTCTCCTGT 


8100 


GACTTGTCCC 


TCCTCCTCTC 


TGCTACCCCA 


8160 


7CAAATATGA TTTGGATCTA 


TGTCCCCGCC 


8220 


GT7GGAGGTG 


GGGCCTTGTG 


AGAAG7GAT7 


8280 


CC7GTTTCTG 


TGATAGAGAT 


CTCACAXCAT 


8340 


CCCTCTCTCT 


CTCTCTCTCT 


TACTCATGCT 


8400 


CACCGTCCAG 


AATGATTGTA 


AGTTTTCTGA 


8460 


TCCTGTACAA 


CTGCAGAATG 


ATGAGCGAAT 


8520 


C7CAGGTATT 


TCTTTATAGC 


AATGCGAGGA 


8580 


CC7GCACACG CTTAGCCCCA GACA7CACTG 


8640 


GCCGACAAAA 


CCAAACTCAC 


AAAAGG7GAC 


8700 


GAAAGAGGGA 


GGACAGTACA 


CTTG7AGCCA 


8760 


GG7AGCACTT 


TCGCATAACA 


TGTGCAXCAC 


8820 


G7AAAACAGC 


CTGGTCTGGG 


GCCGCTCCTG 


8880 
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7CCCCACTTC CC7CC7G7CC ACCACAGGGC GGCACAGTTC C7CCCACCCT GGAGCC7CCC 8940 

CAGGGCCTGC TGACC7CCC7 CAGCCGGGCC CACAGCCCAG CAGGG7CCAC CC7CACCCGG 9000 

GTCACCTCCC CCCACG7CC7 CC7CSCCC7C CGAGCTCCTC ACACGGACTC 7G7CACC7CC 9060 

TCCC7CCAGC CTATCGGCCG CCCACCTGAG GC77GTCGGC CCCCCACTTG ACGCC737CG 9120 

GCTGCCC7C7 GCAGGCAGC7 CC7G7CCCC7 ACACCCCCTC CT7CCCCGGG CTCAGC7GAA 9180 

AGGGCG7C7C CCAGGGCAGC 7CCC7GTGA7 C7CCAGGACA GC7CAG7CTC 7CACAGGC7C 9240 

CGACGCCCCC 7ATGC7G7CA CC7CACAGCC C7G7CA7TAC CA77AAC7CC 7CAG7CCGAT 9300 

GAAG77CACT GAGCGCC7G7 C7CCCGG77A CAGGAAAACT C7G7GACAGG GACCAC5TCT 9360 

GTCC7GC7C7 CTGTGGAA7C CCAGGGCCCA GCCCAGTGCC TGACACGGAA CAGAT3C7CC 9420 

A7AAA7AC7G G7TAAA7G7G 7GGGAGA7C7 C7AAAAAGAA GCA7A7CACC TCCG7G7GGC 9480 

CCCCAGCAG7 CAGAG7C7S7 7CCA7G7GGA CACAGGGGCA C7CGCACCAG CATCGGAGGA 9S40 

GGCCAGCAAG TGCCCGCGGC 7GCCCCAGGA A7GAGGCCTC AACCCCCAGA GCT7CAGAAG 9600 

GGAGGACAGA GGCCTGCAGG GAA7AGA7CC 7CCSGCC7CA CCC7GCAGCC 7AA7CCACAG 9660 

TTCAGGGTCA GCTCACACCA CG7CGACCC7 GG7CAGCATC CC7ACGGCAG TTCCACACAA 9720 

GGCCGGAGG7 C7CCTC77GC CC7CCAGGGG G7GACA77GC ACACAGACA7 CAC7CAGGAA 9780 

ACGGAT7CCC C7GGACAGGA ACCTGGC777 GC7AAGGAAG 7GGAGG7GGA GCC7CG777C 9840 

CATCCC77GC TCCAACAGAC CC77CTGATC 7C7CCCACAT ACC7GC7C7G T7CC777C7C 9900 

GGTCCTATGA GGACCC7C77 C7GCCAGGGG 7CCC7G7GCA AC7CCAGACT CCC7CC7GGT 9960 

ACCACCATGG GGAAGG7GGG G7GA7CACAG GACAG7CAGC C7CGCAGAGA CAGAGACCAC 10020 

CCAGGACTG7 CAGGGAGAAC ATGGACAGGC CC7GAGCCGC AGC7CAGCCA ACAGACACGG 10080 

AGAGGGAGGG 7CCCCC7GGA GCC77CCCCA AGGACAGCAG AGCCCAGAGT CACCCACCTC 10140 

CCTCCACCAC AGTCCTCTC7 TTCCAGGACA CACAAGACAC CTCCCCCTCC ACA7GCAGGA 10200 

TCTGGGGACT CCTGAGACC7 CTGGGCCTGG G7C7CCATCC CTGGG7CACT GGCGCGG7TG 10260 

GTGG7AC7GG AGACAGACGG CTCGTCCCTC CCCAGCCACC ACCCAGTGAG CCTT77TCTA 10320 

GCCCCCAGAG CCACC7C7G7 CACCTTCCTG TTGGGCATCA 7CCCACCT7C CCAGAGCCCT 10380 

GGAGAGCATG GGGAGACCCG GGACCCTGCT GGGTTTCTCT GTCACAAAGG AAAATAATCC 10440 

CCCTGGTGTG ACAGACCCAA GGACAGAACA CAGCAGAGGT CAGCACTGGG GAAGACAGG7 10500 

TGTCCTCCCA GGGGATGGGG GTCCATCCAC C77GCCGAAA AGATTTGTCT GAGGAACTGA 10560 

AAA7AGAAGG GAAAAAAGAG GAGGGACAAA AGAGGCAGAA ATGAGAGGGG AGGGGACAGA 10620 

GGACACC7GA ATAAAGACCA CACCCATGAC CCACG7GATG C7GAGAAGTA CTCC7CCCCT 10680 

AGGAAGAGAC TCAGGGCAGA CGGAGGAACG ACAGCACACC AGACAG7CAC AGCACCC7TG 10140 

ACAAAACGTT CCTCGAACTC AAGC7CTTCT CCACAGAGGA GGACAGAGCA GACAGCAGAG 108O0 

ACCA7GGAG7 CTCCCTCCGC CCC7CCCCAC ACA7GGTGCA TCCCC7GCCA GAGGC7CC7G 10860 

CTCACAGG7G AAGGGAGGAC AACC7GGGAG AGGGTCGGAG GAGGGAGC7G GGG7C7CCTG 10920 
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GGTAGGACAG GGCTGTGAGA CGGACAGAGG GCTCCTGTTG GACCCTGAAT AGGGAAGACG 10980 

ACATCAGAGA GGGACAGGAG TCACACCAGA AAAATCAAAT TGAACTGGAA TTCCAAAGGG 11040 

GCAGGAAAAC CTCAAGAGTT CTATTTTCCT AGTTAATTGT CACTCGCCAC TACGTTTTTA 11100 

AAAATCATAA TAACTCCATC AGATGACACT TTAAATAAAA ACATAACCAG GGCATCAAAC 11160 

ACTGTCCTCA TCCGCCTACC GCGGACATTG GAAAATAAGC CCCAGGCTGT GGAGGGCCCT 11220 

GGGAACCCTC ATGAACTCAT CCACAGGAAT CTGCAGCCTG TCCCAGGCAC TGGGGTGCAA 11280 

CCAAGATC 11288 

(2) INFORMATION EC?. SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 
f :V(A) LENGTH: 3774 base pairs 
{ 3 ) TYPE: nucleic acid 

(C) SXaA-VDSDNSSS : single 

(D) TOPCICGY: linear 

(ii) MOLECULE TYPE: DNA (gencaic) 
(iii) HYPOTHETICAL; NO 
(tii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AAGCTTTTTA GTGCTTTAGA CAGTGAGCTG GTCTGTCTAA CCGAAGTGAC CTGGGCTCCA 60 

TACTCAGCCC CAGAAGTGAA GGGTGAAGCT GGGTGGAGCC AAACCAGGCA AGCCTACCCT 120 

CAGGGCTCCC AGTGGCCTGA CAACCATTGG ACCCAGGACC CATTACTTCT AGGGTAAGGA 180 

AGGTACAAAC ACCAGATCCA ACCATGGTCT GGGGGGACAG CTCTCAAATG CCTAAAAAXA 240 

TACCTGGGAG AGGAGCAGGC AAACTATCAC TGCCCCAGGT TCTCTGAACA GAAAGAGAGG 300 

GGCAACCCAA AGTCCAAATC CAGGTGAGCA GGTGCACCAA ATCCCCAGAG ATATGACGAG 360 

GCAAGAAGTG AAGGAACCAC CCCTGCATCA AATCTTTTGC ATGGGAAGGA GAAGGGGGTT 420 

GCTCATGTTC CCAATCCAGG AGAATGCATT TGGGATCTGC CTTCTTCTCA CTCCTTGGTT 480 

AGCAAGACTA AGCAACCAGG ACTCTGGATT TGGGGAAAGA CGTTTATTTG TGGAGGCCAG 540 

TGATGACAAT CCCACGAGGG CCTAGGTGAA GAGGGCAGGA AGGCTCGAGA CACTGGGGAC 600 

TGAGTGAAAA CCACACCCAT GATCTGCACC ACCCATGGAT GCTCCTTCAT TGCTCACCIT 660 

TCTGTTGATA TCAGATGGCC CCATTTTCTG TACCTTCACA GAAGGACACA GGCTAGGGTC 720 

TGTGCATGGC CTTCATCCCC GGCGCCATGT GAGGACAGCA GGTGGGAAAG ATCATGGGTC 780 

CTCCTGGGTC CTGCAGGGCC AGAACATTCA TCACCCATAC TGACCTCCTA GATGGGAATG #40 

GCTTCCCTGG GGCTGGGCCA ACGGGGCCTG GGCAGGGGAG AAAGGACGTC AGGGGACAGG 900 

GAGGAAGGGT CAT CG AG ACC CAGCCTGGAA GGTTCTTGTC TCTGACCATC CAGGATTTAC 960 

TTCCCTGCAT CTACCTTTGG TCATTTTCCC TCAGCAATGA CCAGCTCTGC TTCCTCATCT 1020 
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CAGCC7CCCA CCCTGGACAC AGCACCCCAG 7CCC7GGCCC GGC7GCA7CC ACCCAA7ACC 1080 

CTGA7AACCC AGGACCCA77 ACT7CTAGGG 7XaGGAGGGT CCAGGAGACA GAAGC7GAGG 1140 

AAAGG7C7GA AGAAGTCACA TCTGTCCTGG CCAGAGGGGA AAAACCATCA GATGC7GAAC 1200 

CAGGAGAA7C TTGACCCAGG AAAGGGACCG AGGACCCAAG AAAGGAGTCA GACCACCAGG 1260 

GTT7GCC7GA GAGGAAGGAr CAAGGCCCCG ACGGAAAGCA GGGC7GGCTG CATG7GCAGG 1320 

ACAC7CG7GG GGCATA7G7G 7CTTAGATTC 7SCC7GAATT CAC737CCCT GCCA7GGCCA 1380 

GACTC7C7AC TCAGGCCTGG ACATGCTGAA A7AGGACAAT GGCC77GTCC TC7C7CCCCA 1440 

CCATTTGGCA AGAGACA7AA AGGACATTCC AGGACATGCC T7C77GGGAG GTCCAGG7TC 1S00 

TCTG7C7CAC ACCTCAGGGA CTG7AGTTAC 7CCA7CAGCC ATGG7AGGTG CTGA7CTCAC 1560 

CCAGCC7G7C CAGGCCC777 CAC7CTCCAC 777G7GACCA TG7CCAGGAC CACCCC7CAG 1620 

ATCC7GAGCC TGCAAA7ACC CCCTTGCTCG G7GGG7GGA7 7CAG7AAACA G7GAGC7CC7 1680 

ATCCAGCCCC CAGAGCCACC 7C7GTCACCT 77C7GC7CGG CA7CA7CCCA CCTTCACAAG 1740 

CACTAAAGAG CATGGGGAGA CC7GGCTAGC 7SGG777CTG CA7CACAAAG AAAA7AA7CC 1800 

CCCAGG77CG GATTCCCAGG GC7CTGTATG 7GGAGC7GAC AGACC7GAGG CCAGGAGATA 1860 

GCAGAGG7CA GCCCTAGGGA GGG7GGGTCA 7CCACCCAGG GGACAGGGGT GCACCAGCCT 1920 

TGC7ACTGAA AGGGCC7CCC CAGGACAGCG CCATCAGCCC 7GCC7GAGAG CTTTGCTAAA 1980 

CAGCAG7CAG AGGAGGCCA7 GGCAGTGGCT GAGC7CCTGC TCCAGGCCCC AACAGACCAG 2040 

ACCAACAGCA CAATGCAC7C CTTCCCCAAC C7CACAGGTC ACCAAACGGA AACTGACGTG 2100 

CTACCTAACC TTAGAGCCAT CAGGGGAGAT AACAGCCCAA T77CCCAAAC AGGCCAGTTT 2160 

CAAXCCCATG ACAATGACC7 C7CTGCTCTC A77C7TCCCA AAA7AGGACG CTGAT7CTCC 2220 

CCCACCATGG ATTTCTCCC7 TGTCCCGGGA CC C TTTTCTG CCCCC7ATGA TCTGGGCACT 2280 

CCTCACACAC ACCTCCTC7C TGGTGACATA TCAGGCTCCC TCAC7CTCAA GCAGTCCAGA 2340 

AAGGACAGAA CCTTGGACAG CGCCCATCTC AGCTTCACCC TTCC7CCTTC ACAGGC7TCA 2400 

GGGCAAAGAA TAAATGGCAG AGGCCACTGA GCCCACAGAT GGTCACAGGC ACTCACCCAG 2460 

GGGCAGAXGC CTGGAGCAGG AGCTGGCGGC GCCACAGGGA GAAGGTGATG CAGCAAGGGA 2S20 

AACCCAGAAA TGGGCAGGAA AGGAGGACAC AGGCTCTGTG GGGCTGCAGC CCAGGGTTCG 2580 

ACTATGAGTG TGAAGCCAXC TCAGCAAGTA AGGCCAGGTC CCATGAACAA GAGTGGGAGC 2640 

ACGTGGCTTC CTGCTCTCXA TATGGGGTGG GGGATTCCAT GCCCCATAGA ACCAGATGGC 2700 

CGGGGTTCAG ATGGAGAAGG AGCAGGACAG GGGATCCCCA GGA7AGGAGG ACCCCAGTGT 2760 

CCCCACCCAG GCAGGTGAC7 GATGAATGGG CA7GCAGGG7 CC7CC7GGGC TGGCC7CTCC 2820 

CTTTG7CCCT CAGGAT7CC7 TGAAGGAACA TCCGGAAGCC GACCACATCT ACCTCGXGGG 28S0 

TTCTGGGCAG TCCATGTAAA CCCAGGAGCT 7G7GTTCCTA CGACGGGTCA TGGCATGTGC 2940 

TGGGGGCACC AAAGAGAGAA ACCTGAGGGC ACGCAGGACC TGCTCTCAGG AGGCATGGGA 3000 

GCCCAGATGG GGAGATGGAT CTCAGGAAAG CC7CCCCCAT CAGGGACGGT GATAGCAATG 3060 
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GGGGGTCTGT GGGAGZGGZC ACGTCGGATT CCCTCGGCTC TCCCAAGTTC CCTCCCATAG 3120 

TCACAACCTG GGGACACT3C CCATGAAGGG GCGCCTTTGC CCACCCAGAT GCTGCTGGTT 318Q 

CTGCCCA7CC ACTACCCTC? CTGCTCCAGC CACTCTGGGT CTTTCTCCAG ATGCCC7GGA 3240 

CAGCCCTGGC CTGGGCC73T CCCCTGAGAG GTGTTGGGAG AAGCTGAGTC TCTGGGGACA 3300 

CTCTCAICAG AGTCTSAAAG GCACATCAGG AAACATCCCT GGTCTCCAGG ACTAGGC&AT 3360 

GAGGAAAGGG CCCCAGC7CC 7CCCTTTGCC AC7GAGAGCG TCGACCCTGG GTGGCCACAG 3420 

TGACTTCTGC GTCTGTCCCA GrCACCCTGA AACCACAACA AAACCCCAGC CCCAGACCCT 3480 

GCAGG7ACAA TACATCT3GG GACAGTCTG7 ACCCAGGGGA AGCCAGTTCT CTCTTCCTAG 3540 

GAGACCGGGC CTCAGCGCT3 TGCCCGGGGC AGGCGGGGGC AGCACGTGCC TGTCCTTGAG 3600 

AACTCGGGAC CTTAAGGGTC 7CTGCTCTGT GAGGCACAGC AAGGAXCCTT CTG7CCAGAG 3660 

ATGAAAGCAG CTCCTGCCCC TCCTCTGACC TCTTCCTCCT TCCCAAATCT CAACCAACAA 3720 

ATAGGTCT7T CAAATC7CAT CATCAAATCT TCATCCATCC ACATCAGAAA GCTT 3774 
(2) INFORMATION ?CR SZQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPCLCCY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCCTGTGATC TCCAGGACAG CTCAGTCTCC GTCCAATCTC 40 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GTTTCCTCAG TGATGTCTGT GTGCAATG 



28 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STOANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TTPS: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCTGCAACTC AAGCTTGAAT TCTCCACAGA GGAGG 



35 
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CLAIMS : 

1. A DNA molecule comprising the carcinoembryonic 
antigen (CEA) transcriptional regulatory sequence (TRS) 
but without associated CEA coding sequence. 

2. A molecular chimaera comprising a CEA TRS and a DNA 
sequence operatively linked thereto encoding a 
heterologous enzyme. 

3. A molecular chimaera according to claim 2 wherein the 
heterologous enzyme is capable of catalysing the 
production of an agent cytoxic or cytostatic to CEA + 
cells. 

4 . A molecular chimaera according to claim 3 wherein the 
heterologous enzyme is cytosine deaminase (CD) . 

5. A molecular chimaera according to any of claims 2 to 
4 wherein the CEA TRS and the sequence encoding a 
heterologous enzyme are in an expression cassette. 

6. A molecular chimaera according to claim 5 which 
comprises DNA sequence of the coding sequence of the gene 
coding for the heterologous enzyme and additionally 
includes an appropriate polyadenylation sequence which is 
linked downstream in a 3' position and in proper 
orientation to the CEA TRS. 

7. A retroviral shuttle vector comprising a molecular 
chimaera according to any of claims 2 to 6. 

8. A retroviral shuttle vector according to claim 7 
comprising a DNA sequence comprising a 5 1 viral LTR 
sequence, a cis acting psi encapsidation sequence, the 
molecular chimaera and a 3' viral LTR sequence. 



SUBSTITUTE SHEET (RULE 26) 



WO 95/14100 



PCT/GB94/02546 



42 

9. A retroviral shuttle vector according to claim 8 
based on Moloney murine leukaemia virus. 

10. A retroviral shuttle vector according to any of 
claims 7 to 9 which is a SIN vector. 

11. An infective virion comprising a retroviral shuttle 
vector according to any of claims 7 to 10, the vector 
being encapsidated within viral proteins to create an 
artificial, infective, replication defective, retrovirus. 

12. A packaging cell line comprising a retroviral 
shuttle vector according to any of claims 7 to 10. 

13. A pharmaceutical composition comprising an infective 
virion according to claim 11 or packaging cell line 
according to claim 12 together with a pharmaceutically 
acceptable carrier. 

14. Use of CEA TRS for targeting expression of a 
heterologous enzyme to CEA + cells. 

15. Use according to claim 14 wherein the heterologous 
enzyme is capable of catalysing the production of an 
agent cytotoxic or cytostatic to CEA + cells. 

16. Use according to claim 15 wherein the heterologous 
enzyme is CD. 

17. A DNA milecule according to claim 1 which comprises 
one or more of the following sequence regions of the CEA 
gene in either orientation: 

about -299b to about +69b, more preferably about -90b to 
about +69br 

-14.4kb to -10.6kb, preferably -13.6kb to -10.6kb; 
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-6. lkb to -3 • 8kb. 

18, A molecular chimaera according to any of claims 2 to 
6, retroviral shuttle vector according to any claims 7 to 
10, packaging cell line according to claim 12 or 
composition according to claim 13 wherein the CEA TRS 
comprises one or more of the following sequence regions 
of the CEA gene in either orientation: 

about -299b to about +69b, more preferably about -90b to 
about +60b; 

-14.4kb to -10.6kb, preferably -13.6kb to -10.6kb; 
-6. lkb to -3.8kb. 

19. Use according to any of claims 14 to 16 wherein the 
CEA TRS comprises one or more of the following sequence 
regions of the CEA gene in either orientation: 

about -199b to about +69b, more preferably about -90b to 
about +6 9b; 

-14.4kb to -10.6kb, preferably -13.6kb to -10.6kb; 
-6. lkb to -3.8kb. 
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Piasmid CEA Coordinates 

pCR113 (-299 to +69 ) 

pCR105 (-1661 to +69) 

pCRU5 ( -14462 to -10691 M-299 to + 69 ) 

pCR148 (-89to-40)+(-90to+69) 

pCR158 [ 3 X (-89 to -40 )]♦ (-90 to + 69 ) 

pCR136 (-3919 to-6071 ) + (-299 to +69 ) 

pCR137 (-6071 to -3919) + (-299 to +69) 

pCR162 (-13579 to -10691 )+(-89 to -40 )+(-90 to +69) 

pCR163 (-10691 to-13579)+(-89to-40)+(-90to+69) 



Fig. SB 
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-14463 
-14403 
-14343 
-14283 
-14223 
-14163 
-14103 
-14043 
-13983 
-13923 
-13663 
-13803 
-13743 
-13683 
-13623 
-13563 
-13503 
-13443 
-13383 
-13323 
-13263 
-13203 



AAGCTTTTTA GTGCTTTAGA CAGTGAGCTG GTCTGTCTAA CCCAAGTGAC CTGGGCTC 
TACTCAGCCC CAGAAGTGAA GGGTGAAGCT GGGTGGAGCC AAACCAGGCA AGCCTACC 
CAGGGCTCCC AGTGGCCTGA GAACCATTGG ACCCAGGACC CATTACTTCT AGGGTAAG 
AGGTACAAAC ACCAGATCCA ACCATGGTCT GGGGGGACAG CTGTCAAATG CCTAAAAA 
TACCTGGGAG AGGAGCAGGC AAACTATCAC TGCCCCAGGT TCTCTGAACA GAAACAGA 
GGCAACCCAA AGTCCAAATC CAGGTGAGCA GGTGCACCAA ATGCCCAGAG ATATGACG 
GCAAGAAGTG AAGGAACCAC CCCTGCATCA AATGTTTTGC ATGGGAAGGA GAAGGGGG 
GCTCATGTTC CCAATCCAGG AGAATGCATT TGGGATCTGC CTTCTTCTCA CTCCTTGG 
AGCAAGACTA AGCAACCAGG ACTCTGGATT TGGGGAAAGA CGTTTATTTG TGGAGGCC 
TGATGACAAT CCCACGAGGG CCTAGGTGAA GAGGGCAGGA AGGCTCGAGA CACTGGGG 
TGAGTGAAAA CCACACCCAT GATCTGCACC ACCCATGGAT GCTCCTTCAT TGCTCACC 
TCTGTTGATA TCAGATGGCC CCATTTTCTG TACCTTCACA GAAGGACACA GGCTAGGG 
TGTGCATGGC CTTCATCCCC GGGGCCATGT GAGGACAGCA GGTGGGAAAG ATCATGGG 
CTCCTGGGTC CTGCAGGGCC AGAACATTCA TCACCCATAC TGACCTCCTA GATGGGAA 
GCTTCCCTGG GGCTGGGCCA ACGGGGCCTG GGCAGGGGAG AAAGGACGTC AGGGGACA 
GAGGAAGGGT CATCGAGACC CAGCCTGGAA GGTTCTTGTC TCTGACCATC CAGGATTT 
TTCCCTGCAT CTACCTTTGG TCATTTTCCC TCAGCAATGA CCAGCTCTGC TTCCTGAT 
CAGCCTCCCA CCCTGGACAC AGCACCCCAG TCCCTGGCCC GGCTGCATCC ACCCAATA 
CTGATAACCC AGGACCCATT ACTTCTAGGG TAAGGAGGGT CCAGGAGACA GAAGCTGA 
AAAGGTCTGA AGAAGTCACA TCTGTCCTGG CCAGAGGGGA AAAACCATCA GATGCTGA 
CAGGAGAATG TTGACCCAGG AAAGGGACCG AGGACCCAAG AAAGGAGTCA GACCACCA 
GTTTGCCTGA GAGGAAGGAT CAAGGCCCCG AGGGAAAGCA GGGCTGGCTG CATGTGCA 
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-13143 
-13083 
-13023 
-12963 
-12903 
-12843 
-12783 
-12723 
-12663 
-12603 
-12543 
-12483 
-12423 
-12363 
-12303 
-12243 
-12183 
-12123 
-12063 
-12003 
-11943 
-11883 
-11823 



ACACTGGTGG GGCATATGTG TCTTAGATTC TCCCTGAATT CAGTGTCCCT GCCATGGC 
GACTCTCTAC TCAGGCCTGG ACATGCTGAA ATAGGACAAT GGCCTTGTCC TCTCTCCC 
CCATTTGGCA AGAGACATAA AGGACATTCC AGGACATGCC TTCCTGGGAG GTCCAGGT 
TCTGTCTCAC ACCTCAGGGA CTGTAGTTAC TGCATCAGCC ATGGTAGGTG CTGATCTC 
CCAGCCTGTC CAGGCCCTTC CACTCTCCAC TTTGTGACCA TGTCCAGGAC CACCCCTC 
ATCCTGAGCC TGCAAATACC CCCTTGCTGG GTGGGTGGAT TCAGTAAACA GTGAGCTC 
ATCCAGCCCC CAGAGCCACC TCTGTCACCT TCCTGCTGGG CATCATCCCA CCTTCACA 
CACTAAAGAG CAT GGGGAG A CCTGGCTAGC TGGGTTTCTG CATCACAAAG AAAATAAT 
CCCAGGTTCG GATTCCCAGG GCTCTGTATG TGGAGCTGAC AGACCTGAGG CCAGGAGA 
GCAGAGGTCA GCCCTAGGGA GGGTGGGTCA TCCACCCAGG GGACAGGGGT GCACCAGC 
TGCTACTGAA AGGGCCTCCC CAGGACAGCG CCATCAGCCC TGCCTGAGAG CTTTGCTA 
CAGCAGTCAG AGGAGGCCAT GGCAGTGGCT GAGCTCCTGC TCCAGGCCCC AACAGACC 
ACCAACAGCA CAATGCAGTC CTTCCCCAAC GTCACAGGTC ACCAAAGGGA AACTGAGG 
CTACCTAACC TTAGAGCCAT CAGGGGAGAT AACAGCCCAA TTTCCCAAAC AGGCCAGT 
CAATCCCATG ACAATGACCT CTCTGCTCTC ATTCTTCCCA AAATAGGACG CTGATTCT 
CCCACCATGG ATTTCTCCCT TGTCCCGGGA GCCTTTTCTG CCCCCTATGA TCTGGGCA 
CCTGACACAC ACCTCCTCTC TGGTGACATA TCAGGGTCCC TCACTGTCAA GCAGTCCA 
AAGGACAGAA CCTTGGACAG CGCCCATCTC AGCTTCACCC TTCCTCCTTC ACAGGGTT 
GGGCAAAGAA TAAATGGCAG AGGCCAGTGA GCCCAGAGAT GGTGACAGGC AGTGACCC 
GGGCAGATGC CTGGAGCAGG AGCTGGCGGG GCCACAGGGA GAAGGTGATG CAGGAAGG 
AACCCAGAAA TGGGCAGGAA AGGAGGACAC AGGCTCTGTG GGGCTGCAGC CCAGGGTT 
ACTATGAGTG TGAAGCCATC TCAGCAAGTA AGGCCAGGTC CCATGAACAA GAGTGGGA 
ACGTGGCTTC CTGCTCTGTA TATGGGGTGG GGGATTCCAT GCCCCATAGA ACCAGATG 
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-11763 
-11703 
-11643 
-11583 
-11523 
-11463 
-11403 
-11343 
-11283 
-11223 
-11163 
-11103 
-11043 
-10983 
-10923 
-10863 
-10803 
-10743 
-10683 
-10623 
-10563 
-10503 
-10443 



CGGGGTTCAG ATGGAGAAGG AGCAGGACAG GGGATCCCCA GGATAGGAGG ACCCCAGT 
CCCCACCCAG GCAGGTGACT GATGAATGGG CATGCAGGGT CCTCCTGGGC TGGGCTCT 
CTTTGTCCCT CAGGATTCCT TGAAGGAACA TCCGGAAGCC GACCACATCT ACCTGGTG 
TTCTGGGGAG TCCATGTAAA GCCAGGAGCT TGTGTTGCTA GGAGGGGTCA TGGCATGT 
TGGGGGCACC AAAGAGAGAA ACCTGAGGGC AGGCAGGACC TGGTCTGAGG AGGCATGG 
GCCCAGATGG GGAGATGGAT GTCAGGAAAG GCTGCCCCAT CAGGGAGGGT GATAGCAA 
GGGGGTCTGT GGGAGTGGGC ACGTGGGATT CCCTGGGCTC TGCCAAGTTC CCTCCCAT 
TCACAACCTG GGGACACTGC CCATGAAGGG GCGCCTTTGC CCAGCCAGAT GCTGCTGG 
CTGCCCATCC ACTACCCTCT CTGCTCCAGC CACTCTGGGT CTTTCTCCAG ATGCCCTG 
CAGCCCTGGC CTGGGCCTGT CCCCTGAGAG GTGTTGGGAG AAGCTGAGTC TCTGGGGA 
CTCTCATCAG AGTCTGAAAG GCACATCAGG AAACATCCCT GGTCTCCAGG ACTAGGCA 
GAGGAAAGGG CCCCAGCTCC . TCCCTTTGCC ACTGAGAGGG TCGACCCTGG GTGGCCAC 
TGACTTCTGC GTCTGTCCCA GTCACCCTGA AACCACAACA AAACCCCAGC CCCAGACC 
GCAGGTACAA TACATGTGGG GACAGTCTGT ACCCAGGGGA AGCCAGTTCT CTCTTCCT 
GAGACCGGGC CTCAGGGCTG TGCCCGGGGC AGGCGGGGGC AGCACGTGCC TGTCCTTG 
AACTCGGGAC CTTAAGGGTC TCTGCTCTGT GAGGCACAGC AAGGATCCTT CTGTCCAG 
ATGAAAGCAG CTCCTGCCCC TCCTCTGACC TCTTCCTCCT TCCCAAATCT CAACCAAC 
ATAGGTGTTT CAAATCTCAT CATCAAATCT TCATCCATCC ACATGAGAAA GCTTAAAA 
CAATGGATTG ACAACATCAA GAGTTGGAAC AAGTGGACAT GGAGATGTTA CTTGTGGA 
TTTAGATGTG TTCAGCTATC GGGCAGGAGA ATCTGTGTCA AATTCCAGCA TGGTTCAG 
GAATCAAAAA GTGTCACAGT CCAAATGTGC AACAGTGCAG GGGATAAAAC TGTGGTGC 
TCAAACTGAG GGATATTTTG GAACATGAGA AAGGAAGGGA TTGCTGCTGC ACAGAACA 
GATGATCTCA CACATAGAGT TGAAAGAAAG GAGTCAATCG CAGAATAGAA AATGATCA 
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-103B3 
-10323 
-10263 
-10203 
-10143 
-10083 
-10023 
-9963 
-9903 
-9843 
-9783 
-9723 
-9663 
-9603 
-9543 
-9483 
-9423 
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AATTCCACCT CTATAAAGTT TCCAAGAGGA AAACCCAATT CTGCTGCTAG AGATCAGA 
GGAGGTGACC TGTGCCTTGC AATGGCTGTG AGGGTCACGG GAGTGTCACT TAGTGCAG 
AATGTGCCGT ATCTTAATCT GGGCAGGGCT TTCATGAGCA CATAGGAATG CAGACATT 
TGCTGTGTTC ATTTTACTTC ACCGGAAAAG AAGAATAAAA TCAGCCGGGC GCGGTGGC 
ACGCCTGTAA TCCCAGCACT TTAGAAGGCT GAGGTGGGCA GATTACTTGA GGTCAGGA 
TCAAGACCAC CCTGGCCAAT ATGGTGAAAC CCCGGCTCTA CTAAAAATAC AAAAATTA 
TGGGCATGGT GGTGCGCGCC TGTAATCCCA GCTACTCGGG AGGCTGAGGC TGGACAAT 
CTTGGACCCA GGAAGCAGAG GTTGCAGTGA GCCAAGATTG TGCCACTGCA CTCCAGCT 
GGCAACAGAG CCAGACTCTG TAAAAAAAAA AAAAAAAAAA AAAAAAAGAA AGAAAGAA 
AGAAAAGAAA GTATAAAATC TCTTTGGGTT AACAAAAAAA GATCCACAAA ACAAACAC 
GCTCTTATCA AACTTACACA ACTCTGCCAG AGAACAGGAA ACACAAATAC TCATTAAC 
ACTTTTGTGG CAATAAAACC TTCATGTCAA AAGGAGACCA GGACACAATG AGGAAGTA 
ACTGCAGGCC CTACTTGGGT GCAGAGAGGG AAAATCCACA AATAAAACAT TACCAGAA 
AGCTAAGATT TACTGCATTG AGTTCATTCC CCAGGTATGC AAGGTGATTT TAACACCT 
AAATCAATCA TTGCCTTTAC TACATAGACA GATTAGCTAG AAAAAAATTA CAACTAGC 
AACAGAAGCA ATTTGGCCTT CCTAAAATTC CACATCATAT CATCATGATG GAGACAGT 
AGACGCCAAT GACAATAAAA AGAGGGACCT CCGTCACCCG GTAAACATGT CCACACAG 
CCAGCAAGCA CCCGTCTTCC CAGTGAATCA CTGTAACCTC CCCTTTAATC AGCCCCAG 
AAGGCTGCCT GCGATGGCCA CACAGGCTCC AACCCGTGGG CCTCAACCTC CCGCAGAG 
TCTCCTTTGG CCACCCCATG GGGAGAGCAT GAGGACAGGG CAGAGCCCTC TGATGCCC 
ACATGGCAGG AGCTGACGCC AGAGCCATGG GGGCTGGAGA GCAGAGCTGC TGGGGTCA 
GCTTCCTGAG GACACCCAGG CCTAAGGGAA GGCAGCTCCC TGGATGGGGG CAACCAGG 
CCGGGCTCCA ACCTCAGAGC CCGCATGGGA GGAGCCAGCA CTCTAGGCCT TTCCTAGG 
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GACTCTGAGG GGACCCTGAC ACGACAGGAT CGCTGAATGC ACCCGAGATG AAGGGGCC 
CACGGGACCC TGCTCTCGTG GCAGATCAGG AGAGAGTGGG ACACCATGCC AGGCCCCC 
GGCATGGCTG CGACTGACCC AGGCCACTCC CCTGCATGCA TCAGCCTCGG TAAGTCAC 
GACCAAGCCC AGGACCAATG TGGAAGGAAG GAAACAGCAT CCCCTTTAGT GATGGAAC 
AAGGTCAGTG CAAAGAGAGG CCATGAGCAG TTAGGAAGGG TGGTCCAACC TACAGCAC 
ACCATCATCT ATCATAAGTA GAAGCCCTGC TCCATGACCC CTGCATTTAA ATAAACGT 
GTTAAATGAG TCAAATTCCC TCACCATGAG AGCTCACCTG TGTGTAGGCC CATCACAC 
ACAAACACAC ACACACACAC ACACACACAC ACACACACAC ACAGGGAAAG TGCAGGAT 
TGGACAGCAC CAGGCAGGCT TCACAGGCAG AGCAAACAGC GTGAATGACC CATGCAGT 
CCTGGGCCCC ATCAGCTCAG AGACCCTGTG AGGGCTGAGA TGGGGCTAGG CAGGGGAG 
ACTTAGAGAG GGTGGGGCCT CCAGGGAGGG GGCTGCAGGG AGCTGGGTAC TGCCCTCC 
GGAGGGGGCT GCAGGGAGCT GGGTACTGCC CTCCAGGGAG GGGGCTGCAG GGAGCTGG 
ACTGCCCTCC AGGGAGGGGG CTGCAGGGAG CTGGGTACTG CCCTCCAGGG AGGGGGCT 
AGGGAGCTGG GTACTGCCCT CCAGGGAGGC AGGAGCACTG TTCCCAACAG AGAGCACA 
TTCCTGCAGC AGCTGCACAG ACACAGGAGC CCCCATGACT GCCCTGGGCC AGGGTGTG 
TTCCAAATTT CGTGCCCCAT TGGGTGGGAC GGAGGTTGAC CGTGACATCC AAGGGGCA 
TGTGATTCCA AACTTAAACT ACTGTGCCTA CAAAATAGGA AATAACCCTA CTTTTTCT 
TATCTCAAAT TCCCTAAGCA CAAGCTAGCA CCCTTTAAAT CAGGAAGTTC AGTCACTC 
GGGGTCCTCC CATGCCCCCA GTCTGACTTG CAGGTGCACA GGGTGGCTGA CATCTGTC 
TGCTCCTCCT CTTGGCTCAA CTGCCGCCCC TCCTGGGGGT GACTGATGGT CAGGACAA 
GATCCTAGAG CTGGCCCCAT GATTGACAGG AAGGCAGGAC TTGGCCTCCA TTCTGAAG 
TAGGGGTGTC AAGAGAGCTG GGCATCCCAC AGAGCTGCAC AAGATGACGC GGACAGAG 
TGACACAGGG CTCAGGGCTT CAGACGGGTC GGGAGGCTCA GCTGAGAGTT CAGGGACA 
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CCTGAGGAGC CTCAGTGGGA AAAGAAGCAC TGAAGTGGGA AGTTCTGGAA TGTTCTGG 
AAGCCTGAGT GCTCTAAGGA AATGCTCCCA CCCCGATGTA GCCTGCAGCA CTGGACGG 
TGTGTACCTC CCCGCTGCCC ATCCTCTCAC AGCCCCCGCC TCTAGGGACA CAACTCCT 
CCTAACATGC ATCTTTCCTG TCTCATTCCA CACAAAAGGG CCTCTGGGGT CCCTGTTC 
CATTGCAAGG AGTGGAGGTC ACGTTCCCAC AGACCACCCA GCAACAGGGT CCTATGGA 
TGCGGTCAGG AGGATCACAC GTCCCCCCAT GCCCAGGGGA CTGACTCTGG GGGTGATG 
TTGGCCTGGA GGCCACTGGT CCCCTCTGTC CCTGAGGGGA ATCTGCACCC TGGAGGCT 
CACATCCCTC CTGATTCTTT CAGCTGAGGG CCCTTCTTGA AATCCCAGGG AGGACTCA 
CCCCACTGGG AAAGGCCCAG TGTGGACGGT TCCACAGCAG CCCAGCTAAG GCCCTTGG 
ACAGATCCTG AGTGAGAGAA CCTTTAGGGA CACAGGTGCA CGGCCATGTC CCCAGTGC 
ACACAGAGCA GGGGCATCTG GACCCTGAGT GTGTAGCTCC CGCGACTGAA CCCAGCCC 
CCCCAATGAC GTGACCCCTG GGGTGGCTCC AGGTCTCCAG TCCATGCCAC CAAAATCT 
AGATTGAGGG TCCTCCCTTG AGTCCCTGAT GCCTGTCCAG GAGCTGCCCC CTGAGCAA 
CTAGAGTG'ZA GAGGGCTGGG ATTGTGGCAG TAAAAGCAGC CACATTTGTC TCAGGAAG 
AAGGGAGGAC ATGAGCTCCA GGAAGGGCGA TGGCGTCCTC TAGTGGGCGC CTCCTGTT 
TGAGCAAAAA GGGGCCAGGA GAGTTGAGAG ATCAGGGCTG GCCTTGGACT AAGGCTCA 
TGGAGAGGAC TGAGGTGCAA AGAGGGGGCT GAAGTAGGGG AGTGGTCGGG AGAGATGG 
GGAGCAGGTA AGGGGAAGCC CCAGGGAGGC CGGGGGAGGG TACAGCAGAG CTCTCCAC 
CTCAGCATTG ACATTTGGGG TGGTCGTGCT AGTGGGGTTC TGTAAGTTGT AGGGTGTT 
GCACCATCTG GGGACTCTAC CCACTAAATG CCAGCAGGAC TCCCTCCCCA AGCTCTAA 
ACCAACAATG TCTCCAGACT TTCCAAATGT CCCCTGGAGA GCAAAATTGC TTCTGGCA 
ATCACTGATC TACGTCAGTC TCTAAAAGTG ACTCATCAGC GAAATCCTTC ACCTCTTG 
AGAAGAATCA CAAGTGTGAG AGGGGTAGAA ACTGCAGACT TCAAAATCTT TCCAAAAG 
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TTTTACTTAA TCAGCAGTTT GATGTCCCAG GAGAAGATAC ATTTAGAGTG TTTAGAGT 
ATGCCACATG GCTGCCTGTA CCTCACAGCA GGAGCAGAGT GGGTTTTCCA AGGGCCTG 
ACCACAACTG GAATGACACT CACTGGGTTA CATTACAAAG TGGAATGTGG GGAATTCT 
AGACTTTGGG AAGGGAAATG TATGACGTGA" GCCCACAGCC TAAGGCAGTG GACAGTCC 
TTTGAGGCTC TCACCATCTA GGAGACATCT CAGCCATGAA CATAGCCACA TCTGTCAT 
GAAAACATGT TTTATTAAGA GGAAAAATCT AGGCTAGAAG TGCTTTATGC TCTTTTTT 
CTTTATGTTC AAATTCATAT ACTTTTAGAT CATTCCTTAA AGAAGAATCT ATCCCCCT 
GTAAATGTTA TCACTGACTG GATAGTGTTG GTGTCTCACT CCCAACCCCT GTGTGGTG 
AGTGCCCTGC TTCCCCAGCC CTGGGCCCTC TCTGATTCCT GAGAGCTTTG GGTGCTCC 
CATTAGGAGG AAGAGAGGAA GGGTGTTTTT AATATTCTCA CCATTCACCC ATCCACCT 
TAGACACTGG GAAGAATCAG TTGCCCACTC TTGGATTTGA TCCTCGAATT AATGACCT 
ATTTCTGTCC CTTGTCCATT TCAACAATGT GACAGGCCTA AGAGGTGCCT TCTCCATG 
ATTTTTGAGG AGAAGGTTCT CAAGATAAGT TTTCTCACAC CTCTTTGAAT TACCTCCA 
tgtgtcccca tcaccatvac CAGCAGCATT TGGACCCTTT TTCTGTTAGT CAGATGCT 
CCACCTCTTG AGGGTGTATA CTGTATGCTC TCTACACAGG AATATGCAGA GGAAATAG 
AAAGGGAAAT CGCATTACTA TTCAGAGAGA AGAAGACCTT TATGTGAATG AATGAGAG 
TAAAATCCTA AGAGAGCCCA TATAAAATTA TTACCAGTGC TAAAACTACA AAAGTTAC 
TAACAGTAAA CTAGAATAAT AAAACATGCA TCACAGTTGC TGGTAAAGCT AAATCAGA 
TTTTTTTCTT AGAAAAAGCA TTCCATGTGT GTTGCAGTGA TGACAGGAGT GCCCTTCA 
CAATATGCTG CCTGTAATTT TTGTTCCCTG GCAGAATGTA TTGTCTTTTC TCCCTTTA 
TCTTAAATGC AAAACTAAAG GCAGCTCCTG GGCCCCCTCC CCAAAGTCAG CTGCCTGC 
CCAGCCCCAC GAAGAGCAGA GGCCTGAGCT TCCCTGGTCA AAATAGGGGG CTAGGGAG 
TAACCTTGCT CGATAAAGCT GTGTTCCCAG AATGTCGCTC CTGTTCCCAG GGGCACCA 
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-4863 CTGGAGGGTG GTGAGCCTCA CTGGTGGCCT GATGCTTACC TTGTGCCCTC ACACCAGT 

-4803 TCACTGGAAC CTTGAACACT TGGCTGTCGC CCGGATCTGC AGATGTCAAG AACTTCTG 

-4743 AGTCAAATTA CTGCCCACTT CTCCAGGGCA GATACCTGTG AACATCCAAA ACCATGCC 

-4683 AGAACCCTGC CTGGGGTCTA CAACACATAT GGACTGTGAG CACCAAGTCC AGCCCTGA 

-4623 CTGTGACCAC CTGCCAAGAT GCCCCTAACT GGGATCCACC AATCACTGCA CATGGCAG 

-4563 AGCGAGGCTT GGAGGTGCTT CGCCACAAGG CAGCCCCAAT TTGCTGGGAG TTTCTTGG 

-4503 CCTGGTAGTG GTGAGGAGCC TTGGGACCCT CAGGATTACT CCCCTTAAGC ATAGTGGG 

-4443 CCCTTCTGCA TCCCCAGCAG GTGCCCCGCT CTTCAGAGCC TCTCTCTCTG AGGTTTAC 

-4383 AGACCCCTGC ACCAATGAGA CCATGCTGAA GCCTCAGAGA GAGAGATGGA GCTTTGAC 

-4323 GGAGCCGCTC TTCCTTGAGG GCCAGGGCAG GGAAAGCAGG AGGCAGCACC AGGAGTGG 

-4263 ACACCAGTGT CTAAGCCCCT GATGAGAACA GGGTGGTCTC TCCCATATGC CCATACCA 

-4203 CCTGTGAACA GAATCCTCCT TCTGCAGTGA CAATGTCTGA GAGGACGACA TGTTTCCC 

-4143 CCTAACGTGC AGCCATGCCC ATCTACCCAC TGCCTACTGC AGGACAGCAC CAACCCAG 

-4083 GCTGGGAAGC TGGGAGAAGA CATGGA? TAC CCATGGCTTC TCACCTTCCT CCAGTCCA 

-4023 GGGCACCATT TATGCCTAGG ACACCCACCT GCCGGCCCCA GGCTCTTAAG AGTTAGGT 

-3963 CCTAGGTGCC TCTGGGAGGC CGAGGCAGGA GAATTGCTTG AACCCGGGAG GCAGAGGT 

-3903 CAGTGAGCCG AGATCACACC ACTGCACTCC AGCCTGGGTG ACAGAATGAG ACTCTGTC 

-3843 AAAAAAAAAG AGAAAGATAG CATCAGTGGC TACCAAGGGC TAGGGGCAGG GGAAGGTG 

-3783 GAGTTAATGA TTAATAGTAT GAAGTTTCTA TGTGAGATGA TGAAAATGTT CTGGAAAA 

-3723 AAATATAGTG GTGAGGATGT AGAATATTGT GAATATAATT AACGGCATTT AATTGTAC 

-3663 TTAACATGAT TAATGTGGCA TATTTTATCT TATGTATTTG ACTACATCCA AGAAACAC 

-3603 GGAGAGGGAA AGCCCACCAT GTAAAATACA CCCACCCTAA TCAGATAGTC CTCATTGT 

-3543 CCAGGTACAG GCCCCTCATG ACCTGCACAG GAATAACTAA GGATTTAAGG ACATGAGG 

Fig. 6 (8/11) 
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TCCCAGCCAA CTGCAGGTGC ACAACATAAA TGTATCTGCA AACAGACTGA GAGTAAAG 
GGGGGCACAA ACCTCAGCAC TGCCAGGACA CACACCCTTC TCGTGGATTC TGACTTTA 
TGACCCGGCC CACTGTCCAG ATCTTGTTGT GGGATTGGGA CAAGGGAGGT CATAAAGC 
GTCCCCAGGG CACTCTGTGT GAGCACACGA GACCTCCCCA CCCCCCCACC GTTAGGTC 
CACACATAGA TCTGACCATT AGGCATTGTG AGGAGGACTC TAGCGCGGGC TCAGGGAT 
CACCAGAGAA TCAGGTACAG AGAGGAAGAC GGGGCTCGAG GAGCTGATGG ATGACACA 
GCAGGGTTCC TGCAGTCCAC AGGTCCAGCT CACCCTGGTG TAGGTGCCCC ATCCCCCT 
TCCAGGCATC CCTGACACAG CTCCCTCCCG GAGCCTCCTC CCAGGTGACA CATCAGGG 
CCTCACTCAA GCTGTCCAGA GAGGGCAGCA CCTTGGACAG CGCCCACCCC ACTTCACT 
TCCTCCCTCA CAGGGCTCAG GGCTCAGGGC TCAAGTCTCA GAACAAATGG CAGAGGCC 
TGAGCCCAGA GATGGTGACA GGGCAATGAT CCAGGGGCAG CTGCCTGAAA CGGGAGCA 
TGAAGCCACA GATGGGAGAA GATGGTTCAG GAAGAAAAAT CCAGGAATGG GCAGGAGA 
AGAGGAGGAC ACAGGCTCTG TGGGG CTGCA GCCCAGGATG GGACTAAGTG TGAAGACA 
TCAGCAGGTG AGGCCAGGTC CCATGAACAG AGAAGCkGCT CCCACCTCCC CTGATGCA 
GACACACAGA GTGTGTGGTG CTGTGCCCCC AGAGTCGGGC TCTCCTGTTC TGGTCCCC 
GGAGTGAGAA GTGAGGTTGA CTTGTCCCTG CTCCTCTCTG CTACCCCAAC ATTCACCT 
TCCTCATGCC CCTCTCTCTC AAATATGATT TGGATCTATG TCCCCGCCCA AATCTCAT 
CAAATTGTAA ACCCCAATGT TGGAGGTGGG GCCTTGTGAG AAGTGATTGG ATAATGCG 
TGGATTTTCT GCTTTGATGC TGTTTCTGTG ATAGAGATCT CACATGATCT GGTTGTTT 
AAGTGTGTAG CACCTCTCCC CTCTCTCTCT CTCTCTCTTA CTCATGCTCT GCCATGTA 
ACGTTCCTGT TTCCCCTTCA CCGTCCAGAA TGATTGTAAG TTTTCTGAGG CCTCCCCA 
AGCAGAAGCC ACTATGCTTC CTGTACAACT GCAGAATGAT GAGCGAATTA AACCTCTT 
CTTTATAAAT TACCCAGTCT CAGGTATTTC TTTATAGCAA TGCGAGGACA GACTAATA 
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ATCTTCTACT CCCAGATCCC CGCACACGCT TAGCCCCAGA CATCACTGCC CCTGGGAG 
TGCACAGCGC AGCCTCCTGC CGACAAAAGC AAAGTCACAA AAGGTGACAA AAATCTGC 
TTGGGGACAT CTGATTGTGA AAGAGGGAGG ACAGTACACT TGTAGCCACA GAGACTGG 
CTCACCGAGC TGAAACCTGG TAGCACTTTG GCATAACATG TGCATGACCC GTGTTCAA 
TCTAGAGATC AGTGTTGAGT AAAACAGCCT GGTCTGGGGC CGCTGCTGTC CCCACTTC 
TCCTGTCCAC CAGAGGGCGG CAGAGTTCCT CCCACCCTGG AGCCTCCCCA GGGGCTGC 
ACCTCCCTCA GCCGGGCCCA CAGCCCAGCA GGGTCCACCC TCACCCGGGT CACCTCGG 
CACGTCCTCC TCGCCCTCCG AGCTCCTCAC ACGGACTCTG TCAGCTCCTC CCTGCAGC 
ATCGGCCGCC CACCTGAGGC TTGTCGGCCG CCCACTTGAG GCCTGTCGGC TGCCCTCT 
AGGCAGCTCC TGTCCCCTAC ACCCCCTCCT TCCCCGGGCT CAGCTGAAAG GGCGTCTC 
AGGGCAGCTC CCTGTGATCT CCAGGACAGC TCAGTCTCTC ACAGGCTCCG ACGCCCCC 
TGCTGTCACC TCACAGCCCT GTCATTACCA TTAACTCCTC AGTCCCATGA AGTTCACT 
GCGCCTGTCT CCCGGTTACA GGAAAACTCT GTGACAGGGA CCACGTCTGT CCTGCTCT 
GTGGAATCCC AGGGCCCAGC CCAGTGCCTG ACACGGAACA GATGC1 CCAT AAATACTG 
TAAATGTGTG GGAGATCTCT AAAAAGAAGC ATATCACCTC CGTGTSGCCC CCAGCAGT 
GAGTCTGTTC CATGTGGACA CAGGGGCACT GGCACCAGCA TGGGAGGAGG CCAGCAAG 
CCCGCGGCTG CCCCAGGAAT GAGGCCTCAA CCCCCAGAGC TTCAGAAGGG AGGACAGA 
CCTGCAGGGA ATAGATCCTC CGGCCTGACC CTGCAGCCTA ATCCAGAGTT CAGGGTCA 
TCACACCACG TCGACCCTGG TCAGCATCCC TAGGGCAGTT CCAGACAAGG CCGGAGGT 
CCTCTTGCCC TCCAGGGGGT GACATTGCAC ACAGACATCA CTCAGGAAAC GGATTCCC 
GGACAGGAAC CTGGCTTTGC TAAGGAAGTG GAGGTGGAGC CTGGTTTCCA TCCCTTGC 
CAACAGACCC TTCTGATCTC TCCCACATAC CTGCTCTGTT CCTTTCTGGG TCCTATGA 
ACCCTGTTCT GCCAGGGGTC CCTGTGCAAC TCCAGACTCC CTCCTGGTAC CACCATGG 
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AAGGTGGGGT GATCACAGGA CAGTCAGCCT CGCAGAGACA GAGACCACCC AGGACTGT 
GGGAGAACAT GGACAGGCCC TGAGCCGCAG CTCAGCCAAC AGACACGGAG AGGGAGGG 
CCCCTGGAGC CTTCCCCAAG GACAGCAGAG CCCAGAGTCA CCCACCTCCC TCCACCAC 
TCCTCTCTTT CCAGGACACA CAAGACACCT CCCCCTCCAC ATGCAGGATC TGGGGACT 
TGAGACCTCT GGGCCTGGGT CTCCATCCCT GGGTCAGTGG CGGGGTTGGT GGTACTGG 
ACAGAGGGCT GGTCCCTCCC CAGCCACCAC CCAGTGAGCC TTTTTCTAGC CCCCAGAG 
ACCTCTGTCA CCTTCCTGTT GGGCATCATC CCACCTTCCC AGAGCCCTGG AGAGCATG 
GAGACCCGGG ACCCTGCTGG GTTTCTCTGT CACAAAGGAA AATAATCCCC CTGGTGTG 
AGACCCAAGG ACAGAACACA GCAGAGGTCA GCACTGGGGA AGACAGGTTG TCCTCCCA 
GGATGGGGGT CCATCCACCT TGCCGAAAAG ATTTGTCTGA GGAACTGAAA ATAGAAGG 
AAAAAGAGGA GGGACAAAAG AGGCAGAAAT GAGAGGGGAG GGGACAGAGG ACACCTGA 
AAAGACCACA CCCATGACCC ACGTGATGCT GAGAAGTACT CCTGCCCTAG GAAGAGAC 
AGGGCAGAGG GAGGAAGGAC AGCAGACCAG ACAGTCACAG CAGCCTTGAC AAAACGTT 
TGGAACTCAA GCTCTTCTCC ACAGAGGAGG ACAGAGCAGA CAGCAGAGAC CATG3AGT 
CCCTCGGCCC CTCCCCACAG ATGGTGCATC CCCTGGCAGA GGCTCCTGCT CAC7.GGTG 
GGGAGGACAA CCTGGGAGAG GGTGGGAGGA GGGAGCTGGG GTCTCCTGGG TAGGACAG 
CTGTGAGACG GACAGAGGGC TCCTGTTGGA GCCTGAATAG GGAAGAGGAC ATCAGAGA 
GACAGGAGTC ACACCAGAAA AATCAAATTG AACTGGAATT GGAAAGGGGC AGGAAAAC 
CAAGAGTTCT ATTTTCCTAG TTAATTGTCA CTGGCCACTA CGTTTTTAAA AATCATAA 
ACTGCATCAG ATGACACTTT AAATAAAAAC ATAACCAGGG CATGAAACAC TGTCCTCA 
CGCCTACCGC GGACATTGGA AAATAAGCCC CAGGCTGTGG AGGGCCCTGG GAACCCTC 
GAACTCATCC ACAGGAATCT GCAGCCTGTC CCAGGCACTG GGGTGCAACC AAGATC 
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